---
title: "Lab 10: Multiple Regression"
author:  "ADD YOUR NAME HERE"
date: now
date-format: "DD/MM/YYYY HH:MM"
format:
    pdf
#    html:
#      theme: a11y
#      highlight-style: a11y
#      self-contained: true
---

#### Load R libraries

```{r}
#| warning: false
#| message: false
library(mosaic)
library(knitr)
library(abd)
library(ggplot2)
library(performance)
options(width=150)
knitr::opts_chunk$set(error = TRUE)
```

## Setting the seed of the random number generator

Use the **set.seed()** function in R to initialize the random number generator.  

 

```{r}
set.seed(2041971)

```


## Mole Rats

Read in data:

```{r}
data(MoleRats)
```


### Exercies 1:  


```{r}
mod1<- lm(ln.energy ~ln.mass + caste, data=MoleRats)
summary(mod1)
```


QUESTION: Write out the equation for the model and interpret the 3 parameters
in the model. Is ln.mass significant? Also, do the castes appear to expend
different amounts of energy? 

ANSWER:  

### Exercises 2

```{r}
mod2<- lm(ln.energy ~ ln.mass + caste + ln.mass:caste, data=MoleRats)
summary(mod2)
```

QUESTION: Write out the equation for the model and interpret the 4 parameters in
the model. What is the slope of the line relating changes in ln.mass and changes 
in ln.energy for lazy individuals? What is the slope for workers?

ANSWER:  

#### Exercise 3

```{r}
 AIC(mod1,mod2)
```

QUESTION:  Compare the two fitted models using AIC (AIC(model1name, model2name). 
Also, comment on the significance of the interaction term as judged by the t-test
(using the summary function on the latter model). Which model is better? 
Justify your answer.

ANSWER:


#### Exercise 4 Plotting  

Below, shows the code for plotting separate slopes for each group
```{r}
#| fig-alt: Scatterplot of log energy expenditure versus log mass for both castes with separate regression lines overlaid.
gf_point(ln.energy~ln.mass, color=~caste, data=MoleRats) %>% gf_lm()  

```

And, now, a common slope for each group. To create this plot, we first have to append predicted values to the MoleRats data set, then we can add lines using `gf_lm` with these predicted values.

```{r}
#| fig-alt: Scatterplot of log energy expenditure versus log mass for both castes with regression lines with common slope overlaid.
MoleRats<-mutate(MoleRats, pred= predict(mod1))
gf_point(ln.energy~ln.mass, color=~caste, data=MoleRats) %>% gf_lm(pred~ln.mass)
```

     
    
#### Exercise 5 Checking assumptions

Let's use the performance package to evaluate the assumptions:  

```{r }
#| fig-alt: Residual diagnostic plots
#| out-width: "100%"
#| fig.width: 8
#| fig.heigh: 8
check_model(mod1, check = c("linearity", "homogeneity", "qq", "normality"))
```

Now, you add the code to produce plots for the model with the interaction (named `mod2`)
```{r }
#| fig-alt: Residual diagnostic plots
#| out-width: "100%"
#| fig.width: 8
#| fig.heigh: 8

```
 

Comments on the assumptions:
   
    
## Modeling abundance of longnose dace

Read in the data:

```{r}
dace<- read.csv("longnosedace.csv")
```

#### Exercise 1
 
Lets begin by fitting a full model = a model containing all of the predictor
variables in the data set:

```{r}
fullmod<-lm(longnosedace~acreage+do2+maxdepth+no3+so4+temp,data=dace)
summary(fullmod)
```

QUESTION: Look at the fitted model using summary(fullmod). Which variables 
are statistically significant?

ANSWER:  



#### Exercise 2


```{r}
#| fig-alt: Residual diagnostic plots
#| out-width: "100%"
#| fig.width: 8
#| fig.heigh: 8
check_model(fullmod, check = c("linearity", "homogeneity", "qq", "normality"))
```


QUESTION: Are the assumptions of linear regression met for the full model?

ANSWER: 

#### Exercise 3

Try to find the ‘best’ model using stepAIC in the MASS library

First, load the MASS library 
```{r}
#| warning: false
#| message: false
library(MASS)
```

Now, use stepAIC
```{r}
#| comment: NA
stepAIC(fullmod)
```


#### Exercise 4

Fit the model chosen using stepAIC and inspect the coefficients with the summary 
function

```{r}


```
 

#### Exercise 5

 
QUESTION: Are the assumptions of linear regression met for the best model? 
Use similar code to step[2] to explore the assumptions.

```{r}

```

ANSWER:   


## Exercise 6

 
```{r}
 
```

QUESTION:  Compare the two different measures of R2 for both the full and 
'best' model.

ANSWER:


##  Model Averaging


```{r}
library(MuMIn)
options(na.action = "na.fail")
ms1 <- dredge(fullmod) # fit all possible subsets 
ms1

# Plot the results
par(mfrow=c(1,1))
plot(ms1)
 
model.avg(ms1, subset = delta < 4)
 
confset.95p <- get.models(ms1, cumsum(weight) <= .95)
avgmod.95p <- model.avg(confset.95p)
summary(avgmod.95p)
```
