An R Introduction to Statistics

Scatter Plot

A scatter plot pairs up values of two quantitative variables in a data set and display them as geometric points inside a Cartesian diagram.

Example

In the data set faithful, we pair up the eruptions and waiting values in the same observation as (x,y) coordinates. Then we plot the points in the Cartesian plane. Here is a preview of the eruption data value pairs with the help of the cbind function.

> duration = faithful$eruptions      # the eruption durations 
> waiting = faithful$waiting         # the waiting interval 
> head(cbind(duration, waiting)) 
     duration waiting 
[1,]    3.600      79 
[2,]    1.800      54 
[3,]    3.333      74 
[4,]    2.283      62 
[5,]    4.533      85 
[6,]    2.883      55

Problem

Find the scatter plot of the eruption durations and waiting intervals in faithful. Does it reveal any relationship between the variables?

Solution

We apply the plot function to compute the scatter plot of eruptions and waiting.

> duration = faithful$eruptions      # the eruption durations 
> waiting = faithful$waiting         # the waiting interval 
> plot(duration, waiting,            # plot the variables 
+   xlab="Eruption duration",        # xaxis label 
+   ylab="Time waited")              # yaxis label

Answer

The scatter plot of the eruption durations and waiting intervals is as follows. It reveals a positive linear relationship between them.

PIC

Enhanced Solution

We can generate a linear regression model of the two variables with the lm function, and then draw a trend line with abline.

> abline(lm(waiting ~ duration))

PIC