# Frequency Distribution of Quantitative Data

The frequency distribution of a data variable is a summary of the data occurrence in a collection of non-overlapping categories.

#### Example

In the data set faithful, the frequency distribution of the eruptions variable is the summary of eruptions according to some classification of the eruption durations.

#### Problem

Find the frequency distribution of the eruption durations in faithful.

#### Solution

The solution consists of the following steps:

1. We first find the range of eruption durations with the range function. It shows that the observed eruptions are between 1.6 and 5.1 minutes in duration.
> duration = faithful\$eruptions
> range(duration)
[1] 1.6 5.1
2. Break the range into non-overlapping sub-intervals by defining a sequence of equal distance break points. If we round the endpoints of the interval [1.6, 5.1] to the closest half-integers, we come up with the interval [1.5, 5.5]. Hence we set the break points to be the half-integer sequence { 1.5, 2.0, 2.5, ... }.
> breaks = seq(1.5, 5.5, by=0.5)    # half-integer sequence
> breaks
[1] 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
3. Classify the eruption durations according to the half-unit-length sub-intervals with cut. As the intervals are to be closed on the left, and open on the right, we set the right argument as FALSE.
> duration.cut = cut(duration, breaks, right=FALSE)
4. Compute the frequency of eruptions in each sub-interval with the table function.
> duration.freq = table(duration.cut)

The frequency distribution of the eruption duration is:

> duration.freq
duration.cut
[1.5,2) [2,2.5) [2.5,3) [3,3.5) [3.5,4) [4,4.5) [4.5,5)
51      41       5       7      30      73      61
[5,5.5)
4

#### Enhanced Solution

We apply the cbind function to print the result in column format.

> cbind(duration.freq)
duration.freq
[1.5,2)            51
[2,2.5)            41
[2.5,3)             5
[3,3.5)             7
[3.5,4)            30
[4,4.5)            73
[4.5,5)            61
[5,5.5)             4

#### Note

Per R documentation, you are advised to use the hist function to find the frequency distribution for performance reasons.

#### Exercise

1. Find the frequency distribution of the eruption waiting periods in faithful.
2. Find programmatically the duration sub-interval that has the most eruptions.