Comparison of Two Population Proportions

A survey conducted in two distinct populations will produce different results. It is often necessary to compare the survey response proportion between the two populations. Here, we assume that the data populations follow the normal distribution.

Example

In the built-in data set named quine, children from an Australian town is classified by ethnic background, gender, age, learning status and the number of days absent from school.

> library(MASS)         # load the MASS package
Eth Sex Age Lrn Days
1   A   M  F0  SL    2
2   A   M  F0  SL   11
.....

In effect, the data frame column Eth indicates whether the student is Aboriginal or Not ("A" or "N"), and the column Sex indicates Male or Female ("M" or "F").

In R, we can tally the student ethnicity against the gender with the table function. As the result shows, within the Aboriginal student population, 38 students are female. Whereas within the Non-Aboriginal student population, 42 are female.

> table(quine\$Eth, quine\$Sex)

F  M
A 38 31
N 42 35

Problem

Assuming that the data in quine follows the normal distribution, find the 95% confidence interval estimate of the difference between the female proportion of Aboriginal students and the female proportion of Non-Aboriginal students, each within their own ethnic group.

Solution

We apply the prop.test function to compute the difference in female proportions. The Yates’s continuity correction is disabled for pedagogical reasons.

> prop.test(table(quine\$Eth, quine\$Sex), correct=FALSE)

2-sample test for equality of proportions
without continuity correction

data:  table(quine\$Eth, quine\$Sex)
X-squared = 0.0041, df = 1, p-value = 0.949
alternative hypothesis: two.sided
95 percent confidence interval:
-0.15642  0.16696
sample estimates:
prop 1  prop 2
0.55072 0.54545