---
title: "Lab 3: Two variable summaries, Sampling Distributions"
author:  "ADD YOUR NAME HERE"
date: now
date-format: "DD/MM/YYYY HH:MM"
format:
    pdf
 #   html:
 #     theme: a11y
 #     highlight-style: a11y
 #     self-contained: true
---

#### Load R libraries

```{r}
#| warning = FALSE,
#| message = FALSE
library(mosaic)
library(knitr)
library(dplyr)
library(abd)
library(Lock5Data)
```


## Standing on one foot

### Load the data

These are the times
```{r}
onefoot<-read.csv("TippyAll2025.csv") 
```


### Exercise 1.  

Is there a relationship between height and amount of time students can stand on one foot? Justify your answer and include supporting information (e.g., summary statistics and one or more visualizations of the data) below:
```{r}


```

Answer: 


### Exercise 2.

Are there any outliers in the data set? Note: if you use boxplot(s) to determine outliers, you may find that your answer depends on whether you visualize side-by-side boxplots (i.e., to see if there are any "outliers" for different height categories, each considered separately) or if you pool all of your data (to look for extreme observation in the population as a whole). You might consider your answer to exercise 1 to guide your approach here.

Answer:


First, describe the Time variable with the outlier(s). Second, create a dataset with the outlier(s) removed and describe the Time variable without the outliers. Comment on the changes.

Below, write your code to describe the data with the outlier IN.
```{r}


```

Now, write your code to remove any outliers (hint: remember the *filter* function), and look at the same set of summary statistics. 
```{r}


```

Comment on any changes here:



### Exercise 3:

Read in the pulse data.

```{r}
pulseall<-read.csv("pulseall2024.csv")

```
 


### Exercise 4: 

Is there a relationship between treatment (exercise Yes/No) and pulse rate? Describe the relationship using at least one summary statistic (broken down by group), at least one visualization, and a description in your own words.

Include your code below:
```{r}


```
 
Answer (describe the relationship here):


## Lock5Data Questions

### Exercise 1


Choose two categorical variables of interest to you, and describe their relationship. Include summary statistic(s), at least one visualization, and a description in your own words.


Include your code below:
```{r}


```
 
Answer (describe the relationship here):

### Exercise 2


Choose two quantitative variables of interest to you, and describe their relationship.


Include your code below:
```{r}


```
 
Answer (describe the relationship here):



## Sampling Distributions

### Setting the Seed 

Before we explore the concept of a sampling distribution, let’s “set a seed” of the random number generator in R so that each time we knit the lab document we get the same answer. 

```{r}


```


### Exercise 1

Using the *do* function, take 1000 samples of size 10 from the list of words in the passage of text from the Origin of Species, and compute the mean for each.  Create a histogram depicting the sampling distribution of the sample means.  

```{r}
Darwin<-read.csv("Darwin.csv")



```


QUESTION:  What is the standard error of $\bar{x}$, when taking a sample of 10 words. Include your code below. Hint, the *sd* function can be used to calculate the standard deviation. If you do this correctly, your answer will appear when you render your file.  


```{r}



```

 


### Exercise 2

Now, create a figure to illustrate the sampling distribution of the mean when taking samples of size 35. Include your code below:

```{r}


```

Question:  What is the standard error of $\bar{x}$ when samples of size 35 are taken?  

Answer:


### Exercise 3

Question:  What do you think would happen to the sampling distribution if we took samples of size 50?  Why?

Answer:


### Exercise 4

Question: Your guesses from the first day of class (using non-random sampling) are contained in a file called DarwinYYYY.csv. Read in these data, filter to include just observations from this year (i.e., year == 2025), and calculate the mean. Compare this value to the sampling distribution in step 3. How likely would it be to see a value this extreme if you had taken a random
sample?

```{r}
Dguesses<-read.csv("Darwin2024.csv") 


```

Answer:
 
