# Frequency Distribution of Quantitative Data

The frequency distribution of a data variable is a summary of the data occurrence in a collection of non-overlapping categories.

#### Example

In the data set faithful, the frequency distribution of the eruptions variable is the summary of eruptions according to some classification of the eruption durations.

#### Problem

Find the frequency distribution of the eruption durations in faithful.

#### Solution

The solution consists of the following steps:

- We first find the range of eruption durations with the range function. It shows that the observed eruptions are between 1.6 and 5.1 minutes in duration.
- Break the range into non-overlapping sub-intervals by defining a sequence of equal distance break points. If we round the endpoints of the interval [1.6, 5.1] to the closest half-integers, we come up with the interval [1.5, 5.5]. Hence we set the break points to be the half-integer sequence { 1.5, 2.0, 2.5, ... }.
- Classify the eruption durations according to the half-unit-length sub-intervals with cut. As the intervals are to be closed on the left, and open on the right, we set the right argument as FALSE.
- Compute the frequency of eruptions in each sub-interval with the table function.

#### Answer

The frequency distribution of the eruption duration is:

duration.cut

[1.5,2) [2,2.5) [2.5,3) [3,3.5) [3.5,4) [4,4.5) [4.5,5)

51 41 5 7 30 73 61

[5,5.5)

4

#### Enhanced Solution

We apply the cbind function to print the result in column format.

duration.freq

[1.5,2) 51

[2,2.5) 41

[2.5,3) 5

[3,3.5) 7

[3.5,4) 30

[4,4.5) 73

[4.5,5) 61

[5,5.5) 4

#### Note

Per R documentation, you are advised to use the hist function to find the frequency distribution for performance reasons.

#### Exercise

- Find the frequency distribution of the eruption waiting periods in faithful.
- Find programmatically the duration sub-interval that has the most eruptions.