An R Introduction to Statistics

Estimated Simple Regression Equation

If we choose the parameters α and β in the simple linear regression model so as to minimize the sum of squares of the error term ϵ, we will have the so called estimated simple regression equation. It allows us to compute fitted values of y based on values of x.

ˆy = a+ bx

Problem

Apply the simple linear regression model for the data set faithful, and estimate the next eruption duration if the waiting time since the last eruption has been 80 minutes.

Solution

We apply the lm function to a formula that describes the variable eruptions by the variable waiting, and save the linear regression model in a new variable eruption.lm.

> eruption.lm = lm(eruptions ~ waiting, data=faithful)

Then we extract the parameters of the estimated regression equation with the coefficients function.

> coeffs = coefficients(eruption.lm); coeffs 
(Intercept)     waiting 
  -1.874016    0.075628

We now fit the eruption duration using the estimated regression equation.

> waiting = 80           # the waiting time 
> duration = coeffs[1] + coeffs[2]*waiting 
> duration 
(Intercept) 
     4.1762

Answer

Based on the simple linear regression model, if the waiting time since the last eruption has been 80 minutes, we expect the next one to last 4.1762 minutes.

Alternative Solution

We wrap the waiting parameter value inside a new data frame named newdata.

> newdata = data.frame(waiting=80) # wrap the parameter

Then we apply the predict function to eruption.lm along with newdata.

> predict(eruption.lm, newdata)    # apply predict 
     1 
4.1762