An R Introduction to Statistics

Interval Estimate of Population Mean with Unknown Variance

After we found a point estimate of the population mean, we would need a way to quantify its accuracy. Here, we discuss the case where the population variance is not assumed.

Let us denote the 100(1 α∕2) percentile of the Student t distribution with n1 degrees of freedom as tα∕2. For random samples of sufficiently large size, and with standard deviation s, the end points of the interval estimate at (1 α) confidence level is given as follows:

        s
¯x± tα∕2√--
        n

Problem

Without assuming the population standard deviation of the student height in survey, find the margin of error and interval estimate at 95% confidence level.

Solution

We first filter out missing values in survey$Height with the na.omit function, and save it in height.response.

> library(MASS)                  # load the MASS package 
> height.response = na.omit(survey$Height)

Then we compute the sample standard deviation.

> n = length(height.response) 
> s = sd(height.response)        # sample standard deviation 
> SE = s/sqrt(n); SE             # standard error estimate 
[1] 0.68117

Since there are two tails of the Student t distribution, the 95% confidence level would imply the 97.5th percentile of the Student t distribution at the upper tail. Therefore, tα∕2 is given by qt(.975, df=n-1). We multiply it with the standard error estimate SE and get the margin of error.

> E = qt(.975, df=n1)SE; E     # margin of error 
[1] 1.3429

We then add it up with the sample mean, and find the confidence interval.

> xbar = mean(height.response)   # sample mean 
> xbar + c(E, E) 
[1] 171.04 173.72

Answer

Without assumption on the population standard deviation, the margin of error for the student height survey at 95% confidence level is 1.3429 centimeters. The confidence interval is between 171.04 and 173.72 centimeters.

Alternative Solution

Instead of using the textbook formula, we can apply the t.test function in the built-in stats package.

> t.test(height.response) 
 
       One Sample ttest 
 
data:  height.response 
t = 253.07, df = 208, pvalue < 2.2e16 
alternative hypothesis: true mean is not equal to 0 
95 percent confidence interval: 
 171.04 173.72 
sample estimates: 
mean of x 
   172.38