The correlation coefficient of two variables in a data sample is their covariance divided by the product of their individual standard deviations. It is a normalized measurement of how the two are linearly related.
Formally, the sample correlation coefficient is defined by the following formula, where sx and sy are the sample standard deviations, and sxy is the sample covariance.
Similarly, the population correlation coefficient is defined as follows, where σx and σy are the population standard deviations, and σxy is the population covariance.
If the correlation coefficient is close to 1, it would indicates that the variables are positively linearly related and the scatter plot falls almost along a straight line with positive slope. For -1, it indicates that the variables are negatively linearly related and the scatter plot almost falls along a straight line with negative slope. And for zero, it would indicates a weak linear relationship between the variables.
Find the correlation coefficient of the eruption duration and waiting time in the data set faithful. Observe if there is any linear relationship between the variables.
We apply the cor function to compute the correlation coefficient of eruptions and waiting.
> waiting = faithful$waiting # the waiting period
> cor(duration, waiting) # apply the cor function
The correlation coefficient of the eruption duration and waiting time is 0.90081. Since it is close to 1, we can conclude that the variables are positively linearly related.