An R Introduction to Statistics

Covariance

The covariance of two variables x and y in a data sample measures how the two are linearly related. A positive covariance would indicates a positive linear relationship between the variables, and a negative covariance would indicate the opposite.

The sample covariance is defined in terms of the sample means as:

           n
s  = --1--∑  (x  − ¯x)(y − ¯y)
xy   n − 1 i=1 i     i

Similarly, the population covariance is defined in terms of the population means μx, μy as:

     -1 N∑
σxy = N   (xi − μx)(yi − μy)
        i=1

Problem

Find the covariance of the eruption duration and waiting time in the data set faithful. Observe if there is any linear relationship between the two variables.

Solution

We apply the cov function to compute the covariance of eruptions and waiting.

> duration = faithful$eruptions   # the eruption durations 
> waiting = faithful$waiting      # the waiting period 
> cov(duration, waiting)          # apply the cov function 
[1] 13.978

Answer

The covariance of the eruption duration and waiting time is 13.978. It indicates a positive linear relationship between the two variables.