An R Introduction to Statistics

Interval Estimate of Population Proportion

After we found a point sample estimate of the population proportion, we would need to estimate its confidence interval.

Let us denote the 100(1 α∕2) percentile of the standard normal distribution as zα∕2. If the samples size n and population proportion p satisfy the condition that np 5 and n(1 p) 5, than the end points of the interval estimate at (1 α) confidence level is defined in terms of the sample proportion as follows.

       ∘--------
¯p± z     ¯p(1-−-¯p)
    α∕2    n

Problem

Compute the margin of error and estimate interval for the female students proportion in survey at 95% confidence level.

Solution

We first determine the proportion point estimate. Further details can be found in the previous tutorial.

> library(MASS)                  # load the MASS package 
> gender.response = na.omit(survey$Sex) 
> n = length(gender.response)    # valid responses count 
> k = sum(gender.response == "Female") 
> pbar = k/n; pbar 
[1] 0.5

Then we estimate the standard error.

> SE = sqrt(pbar(1pbar)/n); SE     # standard error 
[1] 0.032547

Since there are two tails of the normal distribution, the 95% confidence level would imply the 97.5th percentile of the normal distribution at the upper tail. Therefore, zα∕2 is given by qnorm(.975). Hence we multiply it with the standard error estimate SE and compute the margin of error.

> E = qnorm(.975)SE; E              # margin of error 
[1] 0.063791

Combining it with the sample proportion, we obtain the confidence interval.

> pbar + c(E, E) 
[1] 0.43621 0.56379

Answer

At 95% confidence level, between 43.6% and 56.3% of the university students are female, and the margin of error is 6.4%.

Alternative Solution

Instead of using the textbook formula, we can apply the prop.test function in the built-in stats package.

> prop.test(k, n) 
 
       1sample proportions test without continuity 
           correction 
 
data:  k out of n, null probability 0.5 
Xsquared = 0, df = 1, pvalue = 1 
alternative hypothesis: true p is not equal to 0.5 
95 percent confidence interval: 
 0.43672 0.56328 
sample estimates: 
  p 
0.5