An R Introduction to Statistics

Prediction Interval for MLR

Assume that the error term ϵ in the multiple linear regression (MLR) model is independent of xk (k = 1, 2, ..., p), and is normally distributed, with zero mean and constant variance. For a given set of values of xk (k = 1, 2, ..., p), the interval estimate of the dependent variable y is called the prediction interval.


In data set stackloss, develop a 95% prediction interval of the stack loss if the air flow is 72, water temperature is 20 and acid concentration is 85.


We apply the lm function to a formula that describes the variable stack.loss by the variables Air.Flow, Water.Temp and Acid.Conc. And we save the linear regression model in a new variable stackloss.lm.

> attach(stackloss)    # attach the data frame 
> stackloss.lm = lm(stack.loss ~ 
+     Air.Flow + Water.Temp + Acid.Conc.)

Then we wrap the parameters inside a new data frame variable newdata.

> newdata = data.frame(Air.Flow=72, 
+     Water.Temp=20, 
+     Acid.Conc.=85)

We now apply the predict function and set the predictor variable in the newdata argument. We also set the interval type as "predict", and use the default 0.95 confidence level.

> predict(stackloss.lm, newdata, interval="predict") 
     fit    lwr    upr 
1 24.582 16.466 32.697 
> detach(stackloss)    # clean up


The 95% confidence interval of the stack loss with the given parameters is between 16.466 and 32.697.


Further detail of the predict function for linear regression model can be found in the R documentation.

> help(predict.lm)