---
title: "Lab 7: Normal and t-distribution"
author:  "ADD YOUR NAME HERE"
date: now
date-format: "DD/MM/YYYY HH:MM"
format:
    pdf
#    html:
#      theme: a11y
#      highlight-style: a11y
#      self-contained: true
---
 

#### Load R libraries

```{r}
#| warning: false
#| message: false
library(mosaic)
library(knitr)
library(abd)
knitr::opts_chunk$set(
  error = TRUE # do not interrupt in case of errors
)
```

## Setting the seed of the random number generator

Use the **set.seed()** function in R to initialize the random number generator.   

```{r}
set.seed(2041971)

```
 
## Inference for proportions

### Inference Using Formulas

####  Exercise 1
 

QUESTION:  State the null and alternative hypotheses. Is it appropriate to conduct a one-tailed or two-tailed test?
Why? 

ANSWER:   


#### Exercise 2

Under the null hypothesis that the proportion of nests that are predated does not depend on the treatment group, it is appropriate to use a pooled estimate of ˆ p for calculating the standard error. Find the pooled proportion $\hat{p}$ = overall proportion of nests that are found by predators. MAKE SURE YOU UNDERSTAND THE FORMULA USED HERE!

```{r}
#| comment: NA
(p.pooled<-52/112); # or
(60*(13/60)+52*(39/52))/(60+52) 
```

#### Exercise 3 

Use the pooled $\hat{p}$ to calculate the standard error using the relevant formula.

```{r}
#| comment: NA
(se.pdiff.Test<-sqrt(p.pooled*(1-p.pooled)*(1/52 + 1/60)))
```

#### Exercise 4

Calculate the z-statistic, and use the normal distribution to find the p-value. 
```{r}
#| comment: NA
(p1<-39/52)
(p2<-13/60)
(p.diff<-p1-p2)
(z.diffprop<-(p.diff)/se.pdiff.Test)
```

ENTER CODE, below, to determine the p-value

```{r}


```

QUESTION:  What does this tell you about the importance of eggshells in the ability of predators to find nests?

ANSWER:  


#### Exercise 5

Use formulas and the normal distribution to find a 99% confidence interval for the  difference in the proportion of nests that are found by predators in the two treatment groups.

I have include code, below, to do this for you, except you must fill in the correct blanks when using the xqnorm function.
```{r}
se.diff.CI<-sqrt(p1*(1-p1)/52 + p2*(1-p2)/60)
zcrit<-xqnorm( ) # FILL IN THE CORRECT INFORMATION HERE TO GET z*!
p1-p2+c(zcrit, -zcrit)*se.diff.CI
```

QUESTION:  Why did we use a different SE when conducting a hypothesis test
compared to the SE used to calculate the confidence interval? 

ANSWER:  

## Inference using prop.test


### Create the data set
```{r}
#| comment: NA
eggdat<-data.frame(group=c(rep("Nest with shell", 52), 
                           rep("Nest without Shell",60)),
                   result=rep(c("Predated","Not Predated",
                                "Predated","Not Predated"),
                              c(39,13,13,47)))
tally(result~group, data=eggdat, format="count", margins=TRUE)
```


####  Exercise 1

Create a 99% confidence interval using prop.test().

````{r}
 
 
````


###  Inference Using Simulation


#### Exercise 1

Create a randomization distribution that represents the null hypothesis of no difference in the proportions associated with the 2 treatment groups (using the do function, along with 1000 simulations). 

```{r}
rand.dist<-do(5000)*{
   phats<-diffprop(result~shuffle(group), data=eggdat)
}
gf_histogram(~diffprop, data=rand.dist)
(pval.diffprop<-prop(~diffprop >= p1-p2, data=rand.dist))
```


#### Exercise 2

Create a bootstrap distribution for the difference in means (using 1000 simulations), and calculate a 99% confidence interval using this distribution.

```{r}
boot.diffprop<-do(5000)*{
   phats<-diffprop(result~group, data=resample(eggdat, groups=group))
}
gf_histogram(~diffprop, data=boot.diffprop) 
confint(boot.diffprop, level=0.99, method="percentile") 
```


### Exercise 3

QUESTION:  How do your results from [parts 1 and 2] compare to those from Exercise 1 [parts 4 and 5]?

ANSWER:   


## Inference for means

### Eggshell data

```{r}
hgdat<-read.csv("hgdat.csv")
```

###  Using Formulas

#### Exercise 1

Hint: create dotplots for both the original mercury data and also the log-transformed data.  Or, use gf_dhistogram along with gf_fitdistr(dist="dnorm") 
```{r}
 
```

QUESTION:  In Rave et al. (2014), we log transformed the mercury concentrations before testing for a difference in contaminant levels between years. Why? What assumptions do we need to meet to use the t-distriution?

ANSWER:   

#### Exercise 2

Calculate sample means, the difference in means, and the standard deviations for the log-tranformed values in 1981 and in 2004. Also, calculate the number of observations in each group.


```{r}
(mean.hg<-mean(log.Hg~year, data=hgdat))
(diff.mean<-diffmean(log.Hg~year, data=hgdat))
(favs<-favstats(log.Hg~year, data=hgdat))
```

#### Exercise 3

Test whether the mercury levels differ in the two years. 

```{r}
SE.diffmean<-(sqrt(favs$sd[1]^2/favs$n[1]+favs$sd[2]^2/favs$n[2]))
(tstat.diffmean<-diff.mean/SE.diffmean)

```

Use tstat and pt to calculate the p-value, below:

```{r}
xpt() # FILL IN THE CORRECT INFORMATION HERE TO GET THE P-value
```


QUESTION:  What is the p-value? 

ANSWER:

#### Exercise 4

Create a 90% confidence interval for the difference in mean log-concentrations.

```{r}


```


## Inference using t.test

#### Exercise 1

Use t.test(log.Hg ∼ year, data=datasetname, conf.level=0.90) to calculate a p-value and create a 90% confidence interval.

```{r }
 
```

QUESTION: Interpret the confidence interval in the context of the problem.

ANSWER:  


## Inference Using Simulation


#### Exercise 1

Create a 90% bootstrap confidence interval for the difference in means.

 
```{r}
boot.diffmean<-do(1000)*{
  diffmean(log.Hg~year, data=resample(hgdat))
}

confint(boot.diffmean, level=0.90, method="percentile") 
```


#### Exercise 2

The results are very similar (off by 1-2 units in the second decimal place).

# Extra problems

## Confidence interval overlap versus a test for a difference in means

### Exercise 1

Calculate a 95% confidence interval for the mean of the log Hg measurements in 1981 and in 2004.  Then conduct a t-test for the difference in means using conf.int = 0.95.

```{r}
  

```

 
### Paired data

```{r}
#| comment: NA
data(Blackbirds)
```


#### Exercise 1
 
Use t.test to perform a test and compute a 95% confidence interval for the mean difference in logged before-after measurements. Interpret the results in context.

```{r}
 
```