---
title: "Lab 2: Exploring Data, One and Two Variable Summaries"
author:  "ADD YOUR NAME HERE"
date: now
date-format: "DD/MM/YYYY HH:MM"
format:
    pdf
  #  html:
  #    theme: a11y
  #    highlight-style: a11y
  #    self-contained: true
---


#' Load packages

```{r}
#| warning = FALSE,
#| message = FALSE
library(mosaic)
library(knitr)
library(dplyr)
```


## Load the white-tailed deer data

```{r}
 deerdat<-read.csv("DeerCaptures.csv")
```

  

## Section 1.1 

#### Exericse 1:

Consider the white-tailed deer data.  What are the Cases:

Answer:


Which variables are categorical and which are quantitative?  Type your answers, below:

Answer:


## Section 2.1

#### Exericse 1:  

How many deer were captured at each site?  Use the tally function to find out.

```{r}
 
```
 
#### Exericse 2: 

What proportion of the deer in this study were captured at Dirty Nose (DN)? Again, use the *tally* function to find out. Hint: You will have to use the *format* argument in the *tally* function. If you get stuck, look at the examples at the end of the help file for *tally*.
```{r}
 
```

 

#### Exercise 3:  

Create a bargraph showing the number of animals included from each of the 4 study sites. Use the function `gf_bar` to create the bargraph. Also, try `gf_percents`.

```{r}
 
```


#### Exercise 4: 

Use the *pie* function along with the *tally* function to make a pie chart illustrating the proportion of captured deer associated with each study site.  Hint : example code can be found in the Section 2.1 lecture.  

```{r}
 
```

#### Exercise 5:   

Question:  Consider the graphs above, and the frequency table: which site had the most deer included in the
study?

Answer:


#### Exercise 6:  

Suppose a researcher compares survival rates among the four study sites and finds that survival is lower at Dirty Nose (DN) than at the other 3 sites.  After this discovery, he/she decides to explore whether there might have been more fawns captured at Dirty Nose (since fawns survive at lower rates than adults).  Create an appropriate bargraph (side-by-side or multi-panel) that helps to explore this question.  This time, use `gf_percents`.

```{r}
 
```

 

#### Exercise 7.  

Create a mosaic plot to further explore this question. 

```{r}
 
```

#### Exericse 8: 

Lastly, look at the relative frequencies of adult and fawns at each site using:  tally(~ageclass|site, data=deerdat, format="proportion").  


```{r}
 

```

Question:  Do you think differences in ages among the 4 study sites might be important when comparing survival rates?  Briefly justify your answer by referring table and above plots.
 

Answer:

 
## Section 2.2

#### Exericse 1:  

Create a dotplot illustrating the ages of the captured deer using the function *gf_dotplot*.  Describe the shape of the distribution (is it symmetric or skewed [and, if skewed, in which direction]? is it bimodal?)

```{r}
 
```

 

#### Exercise 2: 

Calculate the mean age at capture. Note: there are some individuals with missing ages. You will
have to add the argument *na.rm=T* to the *mean* function because of the missing data (signified by an "NA" in the data frame)

```{r}
 
```

#### Exericse 3:  

Create a histogram of weights at capture using the *gf_histogram* function. Describe the shape of the distribution (is it symmetric or skewed [and, if skewed, in which direction]? is it bimodal?)

```{r}
 
```
 
### Section 2.4

#### Exericse 1: 

Create side-by-side boxplots, summarizing weights at capture for fawns and adults.  Use the *gf_boxplot* function.

```{r}
 
```

#### Exericse 2: 

Create side-by-side histograms illustrating the distribution of weights at capture for fawns and adults.

```{r}
 
```

#### Exericse 3: 

Create side-by-side smooth histograms (kernel density estimates), illustrating the distribution of weights at capture for fawns and adults.  Also, create smoothed histograms that are overlaid on the same plot.  In both cases, use the function *gf_density*. 

```{r}
 
```

#### Exericse 4:  

Calculate the mean weight for fawns and adults (hint: you can do this with one line of code).  Again, note you will have to supply the *na.rm=T* argument to the *mean* function.

```{r}
 
```

#### Exericse 5:  

Why do you think the distribution of weights (in question 3) was bi-modal?

 
