library(mosaic)
library(dplyr)
library(googledrive)
library(googlesheets4)Sampling words from Darwin’s Origin of Species
Load libraries
Read in data
Data are read in from googlesheets (code not shown).
Data cleaning:
There was one observation with a p^ > 1 likely due to entering data in the wrong columns. I will drop that observation (though one could assume that Total and Blue columns should be swapped for that case).
filter(mandm, phat > 1) Timestamp Blue Total phat year
1 2023-09-21 11:52:09 17 4 4.25 2023
mandm <- filter(mandm, phat<= 1.0) # eliminates There was one observation with 57 M&M’s (likely represents a case where someone was sick or excused from attending and purchased a full sized bag).
filter(mandm, Total > 20) # eliminates Timestamp Blue Total phat year
1 2023-09-21 05:56:11 24 57 0.4210526 2023
mandm <- filter(mandm, Total <= 20) # eliminates Visualize the sampling distributions
First, the proportion that are blue.
gf_histogram(~phat|year, data=mandm, xlab = expression(hat(p)) ) %>% gf_vline(xintercept=~0.16) +
theme(text=element_text(size=20)) #change font size of all text`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Then, the number of M&M’s in each bag
gf_dotplot(~Total|year, data=mandm, binwidth=1, dotsize=0.15) +
theme(text=element_text(size=20)) #change font size of all text