An R Introduction to Statistics

Cumulative Frequency Distribution

The cumulative frequency distribution of a quantitative variable is a summary of data frequency below a given level.

Example

In the data set faithful, the cumulative frequency distribution of the eruptions variable shows the total number of eruptions whose durations are less than or equal to a set of chosen levels.

Problem

Find the cumulative frequency distribution of the eruption durations in faithful.

Solution

We first find the frequency distribution of the eruption durations as follows. Further details can be found in the Frequency Distribution tutorial.

> duration = faithful$eruptions 
> breaks = seq(1.5, 5.5, by=0.5) 
> duration.cut = cut(duration, breaks, right=FALSE) 
> duration.freq = table(duration.cut)

We then apply the cumsum function to compute the cumulative frequency distribution.

> duration.cumfreq = cumsum(duration.freq)

Answer

The cumulative distribution of the eruption duration is:

> duration.cumfreq 
[1.5,2) [2,2.5) [2.5,3) [3,3.5) [3.5,4) [4,4.5) [4.5,5) 
     51      92      97     104     134     207     268 
[5,5.5) 
    272

Enhanced Solution

We apply the cbind function to print the result in column format.

> cbind(duration.cumfreq) 
        duration.cumfreq 
[1.5,2)               51 
[2,2.5)               92 
[2.5,3)               97 
[3,3.5)              104 
[3.5,4)              134 
[4,4.5)              207 
[4.5,5)              268 
[5,5.5)              272

Exercise

Find the cumulative frequency distribution of the eruption waiting periods in faithful.