An R Introduction to Statistics

Population Mean Between Two Matched Samples

Two data samples are matched if they come from repeated observations of the same subject. Here, we assume that the data populations follow the normal distribution. Using the paired t-test, we can obtain an interval estimate of the difference of the population means.

Example

In the built-in data set named immer, the barley yield in years 1931 and 1932 of the same field are recorded. The yield data are presented in the data frame columns Y1 and Y2.

> library(MASS)         # load the MASS package 
> head(immer) 
  Loc Var    Y1    Y2 
1  UF   M  81.0  80.7 
2  UF   S 105.4  82.3 
    .....

Problem

Assuming that the data in immer follows the normal distribution, find the 95% confidence interval estimate of the difference between the mean barley yields between years 1931 and 1932.

Solution

We apply the t.test function to compute the difference in means of the matched samples. As it is a paired test, we set the "paired" argument as TRUE.

> t.test(immer$Y1, immer$Y2, paired=TRUE) 
 
           Paired t-test 
 
data:  immer$Y1 and immer$Y2 
t = 3.324, df = 29, p-value = 0.002413 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval: 
  6.122 25.705 
sample estimates: 
mean of the differences 
                 15.913

Answer

Between years 1931 and 1932 in the data set immer, the 95% confidence interval of the difference in means of the barley yields is the interval between 6.122 and 25.705.

Exercise

Estimate the difference between the means of matched samples using your textbook formula.