Lab 9: Simple Linear Regression

Learning Objectives

  • Use R to determine a best fit line.
  • Generate predicted values, confidence intervals for the best line, prediction intervals for new points
  • Evaluate assumptions using diagnostic plots.
  • Explore the effect of measurement error using simulation

We will use a series of examples and problems from Whitlock and Schluter’s Analysis of Biological Data to explore linear regression in today’s lab.

Setting the Seed

Don’t forget to “set the seed” of the random number generator in R so that each time you render the lab document you get the same answer (it helps with writing up your answers :).

Lab Exercises

This example, described below, is taken verbatim from W&S (problem 6, Chapter 17).

Some species seem to thrive in captivity, whereas others are prone to health and behavior difficulties when caged. Maternal care problems in some captive species, for example, lead to high infant mortality. Can these differences be predicted?

The data file ZooMortality in the abd library contains measurements of the infant mortality (percent of births) of 20 carnivore species in captivity along with the log (Base 10) of the minimal home-range sizes (in km\(^2\)) of the same species in the wild (Clubb and Mason 2003).

  1. Draw a scatterplot of these data with a least squares regression line using gf_point(y~x, data=) %>% gf_lm(). Make sure that log of home-range size is used as the explanatory (\(x\)) variable. Describe the relationship between the two variables in words.

  2. Use the lm function to estimate the slope and intercept of the least-squares regression line. Save the result in an object called lm.zoo. Examine the fitted model using the summary function (Hint: if necessary, look at the help file for lm to see how to use the summary function). Using the estimates from the R output, write the equation of the regression line. Interpret the slope coefficient. Is the slope significantly different from 0?

  3. Use the equation for the fitted line (and R as a calculator) to predict the level of infant mortality for a species with log.homerange = 1?

  4. Rather than calculate predicted values “by hand”, we can use the makeFun function in the mosaic library to do this for us.

mort.pred<-makeFun(lm.zoo)
mort.pred(log.homerange=1) 
mort.pred(log.homerange=1, interval="confidence")
mort.pred(log.homerange=1, interval="prediction")

Interpret these intervals for log.homerange = 1. [note: if we have not yet discussed these two types of intervals in class, you can skip this for now and then revisit the question on your own after we cover this section in class].

  1. Which interval (confidence or prediction) is wider? Why? [note: if we have not yet discussed these two types of intervals in class, you can skip this for now and then revisit the question on your own after we cover this section in class].

  2. Model diagnostics: In order to check if the conditions for a linear regression are met we should check for (1) linearity, (2) constant variability, and (3) nearly normal residuals. I’ve included code to generate relevant plots to assess whether the conditions are met. You will need to adapt this code for the second data application.

  3. Outliers should often be investigated because they might have a substantial effect on the estimates of slope and intercept. Recalculate the slope and intercept of the regression line after excluding the outlier with the largest home-range size (corresponding to a polar bear). Hint: remember the filter function. By how much did your slope change?

This example (and set of questions) is taken verbatim from W&S (problem 12, Chapter 17)

Male lizards in the species Crotaphyutus collaris use their jaws as weapons during territorial interactions. Lappin and Husak (2005) tested whether weapon performance (bit force) predicted territory size in this species.

Measurements of both variables are listed in the LizardBite dataset in the abd library for 11 males.

  1. How rapidly does territory size increase with bite force? Estimate the slope of the regression line. Provide a standard error for your estimate.

  2. Model diagnostics: In order to check if the conditions for a linear regression are met we should check for (1) linearity, (2) constant variability, and (3) nearly normal residuals. Generate relevant plots to assess whether the conditions are met (and comment in your template). You will need to adapt the code from the previous example (i.e., from the log-home range size versus infant mortality example).

  3. How uncertain is our estimate of slope? Provide a 90% confidence interval (use bootstrapping or the t-distribution).

  4. Provide an interpretation for the 90% confidence interval in part (2).

  5. Bite force is difficult to measure accurately, and so the values shown probably include some measurement error. To explore the effect of measurement error, add additional measurement error to the predictors. For example, create a variable LizardBite$bite.error <-LizardBite$bite + rnorm(nrow(LizardBite), mean =0, sd=0.5). Note: rnorm(nrow(LizardBite), mean =0, sd=0.5) adds a randomly generated “error” that is normally distributed with mean 0 and sd = 0.5 to each value of bite. Use bite.error as the predictor instead of bite - how does your estimate of the slope change when you introduce additional measurement error? You may also want to plot the data using gf_points(territory~bite.error, data=LizardBite) %>% gf_lm().

Literature cited

Clubb, R. and G. Mason. 2003. Captivity effects on wide-ranging carnivores. Nature 425:473-474.

Lappin, A.K. and J. F. Husak. 2005. Weapon performance, not size, determines mating success and potential reproductive output in the collared lizard (Crotaphyutus collaris). American Naturalist 166:426-436.

A first draft of this lab was adapted from a lab created by Dr. Kari Lock-Morgan (which I can no longer find or access). In addition to changing much of the text, I have used a different data set and modified the coding exercises.

The lab is released under a Creative Commons Attribution-ShareAlike 3.0 Unported.