Lab 9 Intro

John Fieberg

Objectives

Gain experience using R to perform linear regression

  • Using lm to determine best fit line
  • Hypothesis tests/confidence intervals for slope parameter
  • Checking assumptions (diagnostic plots)
  • Predictions for new cases
  • Confidence and prediction intervals

Bonus problem:

  • Effect of measurement error on slope parameter estimates

Data

  • Is log(home range size, base=10) predictive of mortality rates in zoos
  • Can bite force be used to predict territory size

Picture of a lizard.

Both questions involve 2 quantitative variables

Diagnostic plots

Scrabble revisited

\(Y = \beta_0 + \beta_1 X + \epsilon\)

Scatterplot of Scrabble Score versus number of letters with regression line overlaid.

Assumptions:

  • \(E[Y \mid X] = \beta_0 + \beta_1 X\)
  • The errors (\(\epsilon\)) are randomly distributed above and below the line
  • The spread of the data about the line is constant
  • The errors, \(\epsilon\), follow a Normal distribution

Residual versus fitted value plots.

Performance package

Residual diagnostic plots.

Predictions and uncertainty

Predictions

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4144 -2.6085 -0.3791  2.0856  9.8327 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.01441    1.58601   0.009    0.993    
Letters      1.78823    0.13877  12.887 1.54e-12 ***
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.942 on 25 degrees of freedom
Multiple R-squared:  0.8692,    Adjusted R-squared:  0.8639 
F-statistic: 166.1 on 1 and 25 DF,  p-value: 1.535e-12

What score would you predict for someone with 10 letters in their name?

Score = 0.014+1.79*10

Two types of intervals

Scatterplot of Scrabble Score versus number of letters with regression line, 95% confidence, and 95% prediction intervals overlaid.

Predictions for someone with 10 letters in their name

> scrab.pred<-makeFun(lm.scrab)
> scrab.pred(Letters=10, interval="confidence")
       fit     lwr      upr
1 17.85648 16.3517 19.36127
> scrab.pred(Letters=10, interval="prediction")
       fit      lwr      upr
1 17.85648 9.757121 25.95584

Confidence interval = we are 95% sure the average score among individuals with 10 letters in their name is between 16.35 and 19.36

Prediction interval = we are 95% sure the score of a new case with 10 letters in their name will be between 9.76 and 25.96.