# Correlation Coefficient

The correlation coefficient of two variables in a data set equals to their covariance divided by the product of their individual standard deviations. It is a normalized measurement of how the two are linearly related.

Formally, the sample correlation coefficient is defined by the following
formula, where s_{x} and s_{y} are the sample standard deviations, and s_{xy} is the sample
covariance.

Similarly, the population correlation coefficient is defined as follows, where
σ_{x} and σ_{y} are the population standard deviations, and σ_{xy} is the population
covariance.

If the correlation coefficient is close to 1, it would indicate that the variables are positively linearly related and the scatter plot falls almost along a straight line with positive slope. For -1, it indicates that the variables are negatively linearly related and the scatter plot almost falls along a straight line with negative slope. And for zero, it would indicate a weak linear relationship between the variables.

#### Problem

Find the correlation coefficient of eruption duration and waiting time in the data set faithful. Observe if there is any linear relationship between the variables.

#### Solution

We apply the cor function to compute the correlation coefficient of eruptions and waiting.

> waiting = faithful$waiting # the waiting period

> cor(duration, waiting) # apply the cor function

[1] 0.90081

#### Answer

The correlation coefficient of eruption duration and waiting time is 0.90081. Since it is rather close to 1, we can conclude that the variables are positively linearly related.