An R Introduction to Statistics

Multinomial Goodness of Fit

A population is called multinomial if its data is categorical and belongs to a collection of discrete non-overlapping classes.

The null hypothesis for goodness of fit test for multinomial distribution is that the observed frequency fi is equal to an expected count ei in each category. It is to be rejected if the p-value of the following Chi-squared test statistics is less than a given significance level α.

 2  ∑   (fi --ei)2
χ =        ei
     i

Example

In the built-in data set survey, the Smoke column records the survey response about the student’s smoking habit. As there are exactly four proper response in the survey: "Heavy", "Regul" (regularly), "Occas" (occasionally) and "Never", the Smoke data is multinomial. It can be confirmed with the levels function in R.

> library(MASS)       # load the MASS package 
> levels(survey$Smoke) 
[1] "Heavy" "Never" "Occas" "Regul"

As discussed in the tutorial Frequency Distribution of Qualitative Data, we can find the frequency distribution with the table function.

> smoke.freq = table(survey$Smoke) 
> smoke.freq 
 
Heavy Never Occas Regul 
   11   189    19    17

Problem

Suppose the campus smoking statistics is as below. Determine whether the sample data in survey supports it at .05 significance level.

   Heavy   Never   Occas   Regul 
    4.5%   79.5%    8.5%    7.5%

Solution

We save the campus smoking statistics in a variable named smoke.prob. Then we apply the chisq.test function and perform the Chi-Squared test.

> smoke.prob = c(.045, .795, .085, .075) 
> chisq.test(smoke.freq, p=smoke.prob) 
 
        Chi-squared test for given probabilities 
 
data:  smoke.freq 
X-squared = 0.1074, df = 3, p-value = 0.991

Answer

As the p-value 0.991 is greater than the .05 significance level, we do not reject the null hypothesis that the sample data in survey supports the campus-wide smoking statistics.

Exercise

Conduct the Chi-squared goodness of fit test for the smoking data by computing the p-value with the textbook formula.