An R Introduction to Statistics

Covariance

The covariance of two variables x and y in a data set measures how the two are linearly related. A positive covariance would indicate a positive linear relationship between the variables, and a negative covariance would indicate the opposite.

The sample covariance is defined in terms of the sample means as:

           n
s  = --1--∑  (x  - ¯x)(y − ¯y)
xy   n - 1 i=1 i     i

Similarly, the population covariance is defined in terms of the population mean μx, μy as:

     -1 N∑
σxy = N   (xi - μx)(yi − μy)
        i=1

Problem

Find the covariance of eruption duration and waiting time in the data set faithful. Observe if there is any linear relationship between the two variables.

Solution

We apply the cov function to compute the covariance of eruptions and waiting.

> duration = faithful$eruptions   # eruption durations 
> waiting = faithful$waiting      # the waiting period 
> cov(duration, waiting)          # apply the cov function 
[1] 13.978

Answer

The covariance of eruption duration and waiting time is about 14. It indicates a positive linear relationship between the two variables.