Lab 7: Normal Distribution and t-distribution

Learning Objectives

Become comfortable using Normal and t-distributions for statistical inference.
Learn how to construct confidence intervals and perform tests using the t.test() and prop.test() functions in R.
Compare simulation and formula-based methods of inference.
Understand that a test for a difference in (means, proportions) may be significant even when confidence intervals for the individual groups overlap.
Compute confidence intervals and tests for paired data using t.test()

In completing the exercises for today’s lab, it will be helpful to have the formula’s from the book handy. For future reference, they are also included on Canvas under Lab 7: Normal Distribution and t-distribution.

Inference for Proportions

Eggshell Removal in Black-headed Gulls

Tinbergen and his collaborators designed a series of experiments to explore hypotheses for why birds dispose of their eggs after hatching. Here, we will consider the data from one such experiment (aimed at exploring whether disposing of eggshells reduces predation risk). Tinbergen et al. laid out Black-headed Gulls’ eggs - with or without an eggshell nearby. They then recorded the number of eggs that were found by predators in each of the two treatment groups.

Picture of Nikolaas Tinbergen. — Nikolaas Tinbergen

We want to test if the presence of an eggshell nearby increases the probability that the eggs will be discovered by predators using the following data:

	Not Predated	Predated	Total
Nest with shell	13	39	52
Nest without Shell	47	13	60
Total	60	52	112

Lab Exercises: Proportions

Because there are at least 10 responses in cell of the table, we can use the normal approximation to conduct our hypothesis test. First, we will calculate everything “by hand.” I.e., you will use the formula sheet from class, along with xpnorm, xqnorm, xpt, and xqt to calculate p-values and confidence intervals using the normal and t-distributions. We will also use R as a calculator along the way. As we work through the calculations, it is best to assign names to the different quantities so we can use these quantities in later calculations.

NOTE: to simplify this lab a bit, I have provided some code for you. Make sure you understand which formulas are used in each case and why!

State the null and alternative hypotheses. Is it appropriate to conduct a one-tailed or two-tailed test? Why?
Under the null hypothesis that the proportion of nests that are predated does not depend on the treatment group, it is appropriate to use a pooled estimate of $\hat{p}$ when calculating the standard error of the difference in proportions. I have included code in your template to calculate this value. Make sure you can locate this formula (see: difference in proportions, Standard Error (test)). This formula is equivalent to what you would get if you calculated the overall proportion nests that are predated (irrespective of the two treatment groups).
Next, we use the pooled $\hat{p}$ to calculate the standard error for the difference in proportions. Again, I have included the code to calculate this SE. Make sure you can locate (and understand) the relevant formula in the Appendix table.
Next, we calculate the test statistic (z), and use the normal distribution to find the p-value. I have included code to calculate the test statistic, but I want you to determine the p-value. (Hint: remember, xpnorm(q= , mean=0, sd=1) gives the area to the left of $q$ for a normal distribution with mean= 0 and sd =1). What does this tell you about the effect of eggshells on the ability of predators to find nests?
It is important to calculate an “effect size” that measures how different the groups are. We can use formulas and the normal distribution to find a 99% confidence interval for the difference in the proportion of nests that are predated between the two treatment groups. I have included code for calculating the SE, but you will need to determine how to get the right critical (z*) value using xqnorm. Also, make sure you can locate the appropriate formula (see: difference in proportions, Standard Error (CI)). Why did we use a different SE when conducting a hypothesis test compared to the SE used to calculate the confidence interval?

Our next approach uses a built in function in R, prop.test(), that automatically conducts a hypothesis test and calculates a confidence interval for a difference in proportions using the normal distribution.

Before we can use this function, we need to reconstruct the dataset in more detail. The following code will create a dataset with 2 variables, one indicating the treatment group (i.e., “Nest with shell” or “Nest without shell”), and the other variable will indicate whether the nests were predated or not.

eggdat<-data.frame(group=c(rep("Nest with shell", 52), 
                           rep("Nest without Shell",60)),
                   result=rep(c("Predated","Not Predated",
                                "Predated","Not Predated"),
                              c(39,13,13,47)))

You can verify that we get the right counts by typing: tally(result~group, data=eggdat, format="count"). You can also have R calculate the sample proportions by using: tally(result~group, data=eggdat, format="proportion").

tally(result~group,data=eggdat, format="proportion")

              group
result         Nest with shell Nest without Shell
  Not Predated       0.2500000          0.7833333
  Predated           0.7500000          0.2166667

prop.test(result~group, data=eggdat, alternative = "less")


    2-sample test for equality of proportions with continuity correction

data:  tally(result ~ group)
X-squared = 29.75, df = 1, p-value = 2.458e-08
alternative hypothesis: less
95 percent confidence interval:
 -1.0000000 -0.3834424
sample estimates:
   prop 1    prop 2 
0.2500000 0.7833333

The argument alternative = "less" is included to specify that the alternative hypothesis is less than. For other tests you would use either alternative = "greater" or alternative = "two.sided".

You get a lot of output from this function. Notice first the p-value at the end of the second line. This may not match the p-value you computed earlier exactly, but it should be pretty close.

Second, you should see a confidence interval for the difference in proportions. Right now, it does not make much sense (the upper bound is 1?). Confidence intervals only make sense with two-sided tests (because you are leaving out the extremes in both tails), so either change alternative to “two-sided”, or just take out the alternative argument. The default confidence level is 95%. To generate a 99% confidence interval to compare to your earlier calculations, use conf.level = .99.

Create a 99% confidence interval using prop.test(). Note: the confidence interval given will differ a bit from the one you calculated earlier (there are a number of ways to compute confidence intervals for a difference in proportions - based on different approximations to the sampling distribution; R uses a slightly more complex method than we learned in class to derive the confidence interval).

I have written the code for you in this section!

Create a randomization distribution that represents the null hypothesis of no difference in the proportions (use the do function, along with 1000 simulations). Calculate the p-value using this distribution. Hint: we can use the diffprop function to calculate the difference in proportions for two groups (just as we used diffmean to calculate a difference in means). For the randomization distribution, we would use: diffprop(y~shuffle(x), data=).
Create a bootstrap distribution for the difference in proportions (using 1000 simulations), and calculate a 99% confidence interval using this distribution. Hint: resample(eggdat, groups=groups) will bootstrap the data for each group separately. Thus, you can use diffprop(result~group, data=resample(eggdat, groups=group)) to create the bootstrap distribution.
How do your results from [parts 1 and 2] compare to those from Exercise 1 [parts 4 and 5]?

In practice, as long as the sample sizes are large enough for the Central Limit Theorem to kick in, either approach will give you essentially the same answer, and you can use whichever method you prefer!

Inference for Means

NOTE: Make sure you completed the exercises on all 3 tabs of Lab Exercises: Proportions before moving on to this section.

The data for this next set of exercises come from a Minnesota DNR research project I worked on with Dave Rave and Mike Zicus (a UMN alum) (See Rave et al. 2014). Mike Zicus conducted a study in 1981, where he collected common goldeneye and hooded merganser eggs throughout the state and measured eggshell thickness and egg-contaminant levels (e.g., DDT, PCBs, mercury). DDT was used (quite effectively) to control insect pests after World War I, but was banned in the early 1970s due to adverse effects on wildlife populations. In particular, DDT has been tied to declines of bald eagles, brown pelicans, and peregrine falcons in North America (if interested, you can learn more at this link).

Picture of Mike Zicus and Dave Rave — Mike Zicus and Dave Rave, MN DNR

In 2003-2004, we decided to collect more eggs from these two species to evaluate whether eggshell thickness had improved since 1981 and also to evaluate whether mercury concentrations had increased since 1981. Chemicals tend to accumulate as you move up the food chain. Mercury levels tend to be higher for hooded mergansers because they eat more fish, whereas common goldeneyes tend to eat more crustaceans, insects, and mollusks. Today, we will explore the mercury data from hooded mergansers, with mercury measured in parts per million.

Lab Exercises: Means

The egg contaminant data set has variables for mercury (HG) and log(HG). In Rave et al. (2014), we used the log-transformed mercury concentrations when testing for a difference in contaminant levels between years. Why? Hint: create dotplots (using gf_dotplot) for both the original mercury data and also the log-transformed data. What assumptions do we need to meet in order to use the t-distribution? For the rest of the lab, we will work with the log-transformed values.
I have included code to calculate the sample means (of the log-transformed values), their standard deviations, and the sample sizes in 1981 and in 2004. I have also included code to calculate the difference in means (of log-transformed values) for the two years. Run this code!
Test whether the mercury levels differ in the two years. I have included code to calculate the test-statistic “by hand” (again, make sure you can locate the appropriate formula!). What is the p-value? Hint: xpt(tstat, df=df) will compute the area under the curve to the left of tstat for a t-distribution with df degrees of freedom.
Create a 90% confidence interval for the difference in mean log-concentrations, $\mu_{2004}-\mu_{1981}$. Hint: you can use xqt(0.05, df=df) to calculate endpoints of the t-distribution with df degrees of freedom to construct a 90% confidence interval.

Now, we’ll recalculate the p-value and interval using the function t.test(). t.test() works similarly to prop.test(), except it tests for a difference in means rather than difference in proportions.

Use t.test(log.Hg ~ year, data=datasetname, conf.level=0.90) to calculate a p-value and create a 90% confidence interval for the difference in mean log-HG levels between years.
Interpret the confidence interval in the context of the problem.

I have written the code in this section for you!

Create a 90% bootstrap confidence interval for the difference in means (of log-transformed values) using the diffmean function and the resample function with the group= argument to sample the data from each year separately.
Compare these results the results you got using formulas and the t-distribution [Exercises 4, step [4])? The results will not match exactly, but should be close.

In this case, because of the small sample sizes, the bootstrap results from the simulation methods are more trustworthy, provided you generate enough simulations. The preciseness of the simulation methods depends on the number of generated simulations, while the preciseness of the theoretical method depends on the fit of the t-distribution, which we often cannot assess, particularly for small sample sizes.

Additional Problems

Confidence Interval Overlap versus a Test for Difference in Means
Paired data

When comparing different groups, it is tempting to make conclusions based on whether or not the confidence intervals for the different groups overlap. If confidence intervals for the different groups do not overlap, then we can safely say that the groups differ. Nonetheless, confidence intervals may overlap and yet a test using the difference in means (or proportions) may still be significant. Why? We have less confidence in the values near the tails of the distributions - and thus, overlap in the tails of both distributions may be associated with differences that lie outside the confidence interval for the mean difference. The exercise, below, will explore this issues using the mercury data.

We can use the t.test function to test whether the mean is 0 and also to get a confidence interval for the mean log-concentration in each year. For example, to get a 95% confidence interval for the mean log concentration in 1981, we could use t.test(~log.Hg, data=subset(datasetname, year==1981), conf.level=0.95).

Use the above approach to calculate a 95% confidence interval for the mean of the log Hg measurements in 1981 and in 2004. Then calculate a 95% confidence interval for the difference in means by modifying your code from Exercises 5 [step 1] accordingly.

You should find that the individual confidence intervals overlap, but the confidence interval for the difference in means does not contain 0 - i.e., a test for a difference in means is statistically significant!

The next example comes from Whitlock and Schluter’s Analysis of Biological data.

Example 12.2 So macho it makes you sick? (verbatim from Whitlock and Schluter)

In many species, males are more likely to attract females if the males have high testosterone levels. Are males with high testosterone paying for this extra mating success in other ways? One hypothesis is that males with high testosterone might be less able to fight off disease – that is, their high levels of testosterone might reduce their immunocompetence. To test this idea Hasselquist et al. (1999) experimentally increased the testosterone levels of 13 male and female red-winged blackbirds by implanting a small permeable tube filled with testosterone. They measured the antibody levels in each bird’s blood serum both before and after the implant. The antibody levels were measured optically, in units of log $10^{-3}$ optical density per minute (ln[mOD/min]).

These data can be accessed by loading the abd library and typing data(Blackbirds)

Use t.test to perform a test and compute a 95% confidence interval for the mean difference in logged before-after measurements: t.test(~diff.in.logs, data=Blackbirds). Interpret the results in context.

For this example, the dataset contained a column containing the differences. Alternatively, you could have passed two separate data columns to t.test, and included the paired=TRUE argument as follows:

library(abd)

with(Blackbirds, t.test(x=log.before, y=log.after, paired=TRUE))


    Paired t-test

data:  log.before and log.after
t = -1.2714, df = 12, p-value = 0.2277
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -0.15238464  0.04007695
sample estimates:
mean difference 
    -0.05615385

The with function allows us to refer to the columns of the Blackbirds data set without having to use the $ as in Blackbirds$log.before.

Literuture Cited

Rave, D. P., M. C. Zicus, J. R. Fieberg, L. Savoy, K. Regan. 2014. Trends in eggshell thickness and mercury in Common Goldeneye and Hooded Merganser eggs. Wildlife Society Bulletin 38(1):9–13.

Zicus, M. C., M. A. Briggs, and R. M. Pace, III. 1988. DDE, PCB, and mercury residues in Minnesota common goldeneye and hooded merganser eggs, 1981. Canadian Journal of Zoology 66:1871–1876.

A first draft of this lab was adapted from a lab created by Dr. Kari Lock-Morgan (which I can no longer find or access). In addition to changing much of the text, I have used a different data set and modified the coding exercises.

The lab is released under a Creative Commons Attribution-ShareAlike 3.0 Unported.