An R Introduction to Statistics

Category Statistics

In the built-in data set painters, the painters are classified according to the schools they belong. Each school can be characterized by its various statistics, such as mean composition, drawing, coloring and expression scores.

Suppose we would like to know which school has the highest mean composition score. We would have to first find out the mean composition score of each school. The following shows how to find the mean composition score of an arbitrarily chosen school.

Problem

Find out the mean composition score of school C in the data set painters.

Solution

The solution consists of a few steps:

  1. Create a logical index vector for school C.
    > library(MASS)                 # load the MASS package 
    > school = painters$School      # the painter schools 
    > c_school = school == "C"      # the logical index vector
  2. Find the child data set of painters for school C. For explanation, please consult the tutorial of Data Frame Row Slice.
    > c_painters = painters[c_school, ]  # child data set
  3. Find the mean composition score of school C.
    > mean(c_painters$Composition) 
    [1] 13.167

Answer

The mean composition score of school C is 13.167.

Alternative Solution

Instead of computing the mean composition score manually for each school, use the tapply function to compute them all at once.

> tapply(painters$Composition, painters$School, mean) 
     A      B      C      D      E      F      G      H 
10.400 12.167 13.167  9.100 13.571  7.250 13.857 14.000

Exercise

  1. Find programmatically the school with the highest composition scores.
  2. Find the percentage of painters whose color score is equal to or above 14.