Population Mean Between Two Independent Samples

Two data samples are independent if they come from unrelated populations and the samples does not affect each other. Here, we assume that the data populations follow the normal distribution. Using the unpaired t-test, we can obtain an interval estimate of the difference between two population means.

Example

In the data frame column mpg of the data set mtcars, there are gas mileage data of various 1974 U.S. automobiles.

> mtcars$mpg
[1] 21.0 21.0 22.8 21.4 18.7 ...

Meanwhile, another data column in mtcars, named am, indicates the transmission type of the automobile model (0 = automatic, 1 = manual).

> mtcars$am
[1] 1 1 1 0 0 0 0 0 ...

In particular, the gas mileage for manual and automatic transmissions are two independent data populations.

Problem

Assuming that the data in mtcars follows the normal distribution, find the 95% confidence interval estimate of the difference between the mean gas mileage of manual and automatic transmissions.

Solution

As mentioned in the tutorial Data Frame Row Slice, the gas mileage for automatic transmission can be listed as follows:

> L = mtcars$am == 0
> mpg.auto = mtcars[L,]$mpg
> mpg.auto # automatic transmission mileage
[1] 21.4 18.7 18.1 14.3 24.4 ...

By applying the negation of L, we can find the gas mileage for manual transmission.

> mpg.manual = mtcars[!L,]$mpg
> mpg.manual # manual transmission mileage
[1] 21.0 21.0 22.8 32.4 30.4 ...

We can now apply the t.test function to compute the difference in means of the two sample data.

> t.test(mpg.auto, mpg.manual)

        Welch Two Sample t-test

data:  mpg.auto and mpg.manual
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.2802  -3.2097
sample estimates:
mean of x mean of y
   17.147    24.392

Answer

In mtcars, the mean mileage of automatic transmission is 17.147 mpg and the manual transmission is 24.392 mpg. The 95% confidence interval of the difference in mean gas mileage is between 3.2097 and 11.2802 mpg.

Alternative Solution

We can model the response variable mtcars$mpg by the predictor mtcars$am, and then apply the t.test function to estimate the difference of the population means.

> t.test(mpg ~ am, data=mtcars)

        Welch Two Sample t-test

data:  mpg by am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.2802  -3.2097
sample estimates:
mean in group 0 mean in group 1
         17.147          24.392

Note

Some textbooks truncate down the degree of freedom to an integer, and the result would differ from the t.test.

Tags:

An R Introduction to Statistics