An R Introduction to Statistics

Population Mean Between Two Independent Samples

Two data samples are independent if they come from unrelated populations and the samples does not affect each other. Here, we assume that the data populations follow the normal distribution. Using the unpaired t-test, we can obtain an interval estimate of the difference between two population means.


In the data frame column mpg of the data set mtcars, there are gas mileage data of various 1974 U.S. automobiles.

> mtcars$mpg 
 [1] 21.0 21.0 22.8 21.4 18.7 ...

Meanwhile, another data column in mtcars, named am, indicates the transmission type of the automobile model (0 = automatic, 1 = manual).

> mtcars$am 
 [1] 1 1 1 0 0 0 0 0 ...

In particular, the gas mileage for manual and automatic transmissions are two independent data populations.


Assuming that the data in mtcars follows the normal distribution, find the 95% confidence interval estimate of the difference between the mean gas mileage of manual and automatic transmissions.


As mentioned in the tutorial Data Frame Row Slice, the gas mileage for automatic transmission can be listed as follows:

> L = mtcars$am == 0 
> = mtcars[L,]$mpg 
>                    # automatic transmission mileage 
 [1] 21.4 18.7 18.1 14.3 24.4 ...

By applying the negation of L, we can find the gas mileage for manual transmission.

> mpg.manual = mtcars[!L,]$mpg 
> mpg.manual                  # manual transmission mileage 
 [1] 21.0 21.0 22.8 32.4 30.4 ...

We can now apply the t.test function to compute the difference in means of the two sample data.

> t.test(, mpg.manual) 
        Welch Two Sample t-test 
data: and mpg.manual 
t = -3.7671, df = 18.332, p-value = 0.001374 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval: 
 -11.2802  -3.2097 
sample estimates: 
mean of x mean of y 
   17.147    24.392


In mtcars, the mean mileage of automatic transmission is 17.147 mpg and the manual transmission is 24.392 mpg. The 95% confidence interval of the difference in mean gas mileage is between 3.2097 and 11.2802 mpg.

Alternative Solution

We can model the response variable mtcars$mpg by the predictor mtcars$am, and then apply the t.test function to estimate the difference of the population means.

> t.test(mpg ~ am, data=mtcars) 
        Welch Two Sample t-test 
data:  mpg by am 
t = -3.7671, df = 18.332, p-value = 0.001374 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval: 
 -11.2802  -3.2097 
sample estimates: 
mean in group 0 mean in group 1 
         17.147          24.392


Some textbooks truncate down the degree of freedom to an integer, and the result would differ from the t.test.