---
title: "Lab 9: Simple Linear Regression"
author:  "ADD YOUR NAME HERE"
date: now
date-format: "DD/MM/YYYY HH:MM"
format:
    pdf
#    html:
#      theme: a11y
#      highlight-style: a11y
#      self-contained: true
---


#### Load R libraries

```{r}
#| warning: false
#| message: false
library(mosaic)
library(knitr)
library(abd)
library(performance)
knitr::opts_chunk$set(error = TRUE)
```

## Setting the seed of the random number generator

Use the **set.seed()** function in R to initialize the random number generator.  

 

```{r}
set.seed(2041971)

```


## Zoo Mortality

Read in the data
```{r}
data(ZooMortality)
```

### Exercise 1

Draw a scatterplot of these data using gf_point and gf_lm(), with the log of home-range size as the explanatory variable.  

```{r}
 

```

QUESTION:  Describe the relationship between the two variables in words.

ANSWER: 



### Exercise 2

 
```{r}
lm.zoo<-lm(  )


```

QUESTION:  Using the estimates from the R output write the equation of the regression line. Interpret the slope coefficient. Is the slope significantly different from 0?

ANSWER:  

 

### Exercise 3

Use the equation for the fitted line to predict the level of infant mortality for a species with log.homerange= 1?

```{r}



```

### Exercise 4

Code for calculating prediction and confidence intervals is included, below:

Using functions in the mosaic library
```{r}
mort.pred<-makeFun(lm.zoo)
mort.pred(log.homerange=1, interval="confidence")
mort.pred(log.homerange=1, interval="prediction")
```
 

QUESTION:  Interpret these intervals for a value of log.homerange = 1. [note: if we have not yet discussed these two types of intervals in class, you can skip this for now and then revisit the question on your own after we cover this section in class].

ANSWER:   

 

### Exercise 5


QUESTION:  which interval is wider? Why?[note: if we have not yet discussed these two types of intervals in class, you can skip this for now and then revisit the question on your own after we cover this section in class].

ANSWER:  


We can plot these two types of intervals using:

```{r}
gf_point(mortality~log.homerange, data=ZooMortality) %>%
  gf_lm(interval = "prediction", fill = "DarkBlue") %>%
  gf_lm(interval = "confidence", fill = "DarkRed") 
```


### Exercise 6

Create diagnostic plots to assess whether the assumptions for linear regression (linearity, constant variance, normality) are met.  

Scatterplot to evaluate linearity:
```{r}
gf_point(mortality~log.homerange, data=ZooMortality, xlab="Log home-range size",
          ylab="Infant Mortality Rate") %>% gf_lm() 
```


```{r}
#| fig-alt: Residual diagnostic plots
#| out-width: "100%"
#| fig.width: 8
#| fig.heigh: 8
check_model(lm.zoo, check = c("linearity", "homogeneity", "qq"))
```

QUESTION:  Comment on whether or not you think the assumptions for linear regression are met.

ANSWER (provided in this case, but you should read through the text):  The linearity assumption seems reasonable, based on the first plot (we can fit a horizontal line at ($y = 0$) through the shaded area encompassing the residuals). However, we should not extrapolate outside the range of the data, as this will lead to predicted values < 0 or > 1). 

Upper right plot: The green trend line drops quickly near the far left of the plot due to a single outlying residual. If that residual were removed, then the trend line would largely be horizontal, and I would conclude that the constant variability assumption is reasonable.  
Lower left plot: The points do not fall closely to the line, but none of them fall outside of the gray bands.  Therefore, we do not have a lot of evidence to suggest that the Normality assumption is problematic (even though the plot doesn't look great).    

Another way to produce diagnostic plots is to use the plot function associated
with the regression object. This will, by default, create 4 different plots. We 
will only look at the first two, and display them side-by-side. The line of code:
par(mfrow=c(1,2)) # will creates a 1 x 2 panel of plots.

```{r}
#| fig-alt: Residual diagnostic plots.
par(mfrow=c(1,2))  
plot(lm.zoo, which=1:2, add.smooth=FALSE)
```

Plots:

- The first plot is a residual versus fitted plot (similar to the upper left plot from the performance package).  The variability about the horizontal line (at 0) should be similar across the range of x (i.e., variability should be similar for all fitted values).
- The second plot is used to assess Normality, and is called a QQplot.  It is similar to the lower left plot created by the performance package. If the data come from a Normal distribution, then the dots should fall pretty much on a straight line.  This plot looks pretty good.  There are a couple of points in the extremes that are labelled and fall a bit off the line. I wouldn't worry too much about these, but these points could suggest the distribution of the residuals is non-normal, with "heavier tails" (similar to a t-distribution).  
 
In each plot, observations that may be extreme are numbered (the number indicates the row where the observation occurs in the data set).  The 20th observation (for Polar bears) is flagged in each plot.  We may want to explore how results change if we were to drop this observation.
 

### Exercise 7

Outliers should often be investigated because they might have a substantial effect on the estimates of slope and intercept. Recalculate the slope and intercept of the regression line after excluding the outlierwith the largest home-range size (corresponding to a polar bear). Hint: remember the filter function.

```{r}



```    

QUESTION:  How did the slope change after dropping the outlier.

ANSWER:   


## Jaw Strength and Territory Size

```{r}
data(LizardBite)
```

### Exercise 1

```{r}



```

QUESTION:  How rapidly does territory size increase with bite force? Estimate the slope of the regression line. Provide a standard error for your estimate.

ANSWER:  



### Exercise 2

Check the assumptions needed for linear regression.

```{r}



```

Using the plot function...

```{r}



```

QUESTION:  comment on whether or not you think the assumptions are met.

ANSWER: 



### Exercise 3

How uncertain is our estimate of slope? Provide a 90% confidence interval  (use bootstrapping or the t-distribution).

```{r}



```



### Exercise 4

QUESTION:  Provide an interpretation for the 90% confidence interval in part (2).  

ANSWER:  


### Exercise 5

```{r}

```

QUESTION:  how does your estimate of the slope change when we introduce additional 
measurement error?

ANSWER: 
