An R Introduction to Statistics

Cumulative Relative Frequency Distribution

The cumulative relative frequency distribution of a quantitative variable is a summary of frequency proportion below a given level.

The relationship between cumulative frequency and relative cumulative frequency is:

Cumulative Relative Frequency = Cumulative-Frequency
                                   Sample Size

Example

In the data set faithful, the cumulative relative frequency distribution of the eruptions variable shows the frequency proportion of eruptions whose durations are less than or equal to a set of chosen levels.

Problem

Find the cumulative relative frequency distribution of the eruption durations in faithful.

Solution

We first find the frequency distribution of the eruption durations as follows. Further details can be found in the Frequency Distribution tutorial.

> duration = faithful$eruptions 
> breaks = seq(1.5, 5.5, by=0.5) 
> duration.cut = cut(duration, breaks, right=FALSE) 
> duration.freq = table(duration.cut)

We then apply the cumsum function to compute the cumulative frequency distribution.

> duration.cumfreq = cumsum(duration.freq)

Then we find the sample size of faithful with the nrow function, and divide the cumulative frequency distribution with it. As a result, the cumulative relative frequency distribution is:

> duration.cumrelfreq = duration.cumfreq / nrow(faithful)

Answer

The cumulative relative frequency distribution of the eruption variable is:

> duration.cumrelfreq 
[1.5,2) [2,2.5) [2.5,3) [3,3.5) [3.5,4) [4,4.5) [4.5,5) 
0.18750 0.33824 0.35662 0.38235 0.49265 0.76103 0.98529 
[5,5.5) 
1.00000

Enhanced Solution

We can print with fewer digits and make it more readable by setting the digits option.

> old = options(digits=2) 
> duration.cumrelfreq 
[1.5,2) [2,2.5) [2.5,3) [3,3.5) [3.5,4) [4,4.5) [4.5,5) 
   0.19    0.34    0.36    0.38    0.49    0.76    0.99 
[5,5.5) 
   1.00 
> options(old)    # restore the old option

We then apply the cbind function to print both the cumulative frequency distribution and relative cumulative frequency distribution in parallel columns.

> old = options(digits=2) 
> cbind(duration.cumfreq, duration.cumrelfreq) 
        duration.cumfreq duration.cumrelfreq 
[1.5,2)               51                0.19 
[2,2.5)               92                0.34 
[2.5,3)               97                0.36 
[3,3.5)              104                0.38 
[3.5,4)              134                0.49 
[4,4.5)              207                0.76 
[4.5,5)              268                0.99 
[5,5.5)              272                1.00 
> options(old)

Exercise

Find the cumulative frequency distribution of the eruption waiting periods in faithful.