Today, you will work through a series of problems involving multiple groups (i.e., problems involving a categorical variable with more than 2 levels). You will become comfortable analyzing data using:
a \(\chi^2\) test for association between 2 categorical variables
ANOVA (Analysis of Variance) when comparing means of multiple groups
In addition, you will gain exposure to methods for dealing with multiple (pairwise) comparisons
Don’t forget to “set the seed” of the random number generator in R so that each time you render the lab document you get the same answer (it helps with writing up your answers :).
\(\chi^2\) Test of Association: Eggshell Removal in Black-headed Gulls
In last week’s lab, we considered an experiment by Tinbergen and his colleagues to evaluate whether having an eggshell near a nest of eggs increased the likelihood that predators would find the nest (Tinbergen et al. 1962). In this lab, we will consider a second experiment in which they evaluated whether or not it matters how far the eggshell is from the eggs. Specifically, they placed eggshells 15cm, 100cm, or 200cm away from a set of eggs and then determined the proportion of ``nests’’ in each group that were found and taken by predators:
We can test whether there is an association between distance (from eggshell to eggs) and the proportion of eggs found by predators using chisq.test or the “mosiac-enhaNced” xchisq.test function in R. The easiest way to use this function is to pass it a table or matrix of counts. We can either input these counts directly or create them from a dataset as follows:
# Option 1: create the matrix of counts from scratchtable.egg1<-matrix(c(63,87,48,102,32,118), ncol=2, byrow=T)
# Option 2: first, create the data. Then, tally the counts in the second lineeggdat<-data.frame(group=rep(c("15cm", "100cm", "200cm"), c(150,150, 150)), predated=rep(c("Yes","No","Yes","No","Yes","No"),c(63,87,48,102,32,118)))table.egg2<-tally(predated~group, data=eggdat,format="count")table.egg2
group
predated 100cm 15cm 200cm
No 102 87 118
Yes 48 63 32
Notice, in the second case, that R will by default order the columns (or rows) alpha-numerically (thus 100cm, 15cm, then 200cm). We could fix this by typing:
State the null and alternative hypothesis, and use the xchisq.test function to perform the test.
The xchisq.test function will also let you conduct the test using a randomization procedure if you use the argument simulate.p.value = TRUE. When would you want to use this option?
Look at the residuals and expected values for each cell. In what way do the data deviate from what you would expect IF the Null Hypothesis were true?
In the next example, we will explore a dataset collected by researcher at the MN DNR. Many wildlife research studies employ GPS- or VHF-collars on individuals in order to follow animal movements, determine habitat use and preference, and to monitor survival rates. It is important to evaluate collar performance prior to conducting a study in order to determine the effect of habitat on the probability of obtaining a successful location and also on the degree of measurement error associated with the locations (e.g., average distance between estimated and true locations). Closed canopies often result in larger measurement errors and more missing data Frair et al. 2010).
Bob Wright, a GIS guru with the MN DNR ran a series of tests using calf collars prior to deploying them on moose. He did this by putting the collars in 4 different habitat types (see Figure on previous page). He left them in the same spot for several days, and then determined the location error associated with the measurements in each of the 4 different cover types. The data are in a file called CollarTest.csv.
Use dim(CollarTest) to determine how many cases are in the dataset. The function dim returns the dimension (number of rows, number of columns). Use names to determine the variable names in the data set. These functions provide another way to inspect your data (other than clicking on the dataset name in the upper right Environment window).
How many locations are there in each of the 4 cover types? Hint: you can use the favstats function. If you have forgotten how to use favstats, view the help file for this function by typing ?favstats. Look at the examples at the end of the help file.
Note, the 3rd variable, DOP, stands for dilution of precision and provides a measure of GPS receiver/satellite geometry. In general, lower values should be indicative of more accurate locations. Also, note that Error measures the distance between true and estimated values.
Construct a plot to explore the distribution of errors in each of the 4 cover types. Also, find the mean error in each cover type. Which cover type resulted in the largest errors on average?
We can use ANOVA to test for a difference in mean error across the four cover types. Before proceeding with ANOVA and the F-distribution, we need to make sure certain conditions are satisfied.
Are the sample sizes large enough in each group (\(\ge 30\)) and/or are the data approximately normally distributed? Is the equal variability condition satisfied? Hint: calculate the standard deviation for each group. As rough rule of thumb, the constant variance assumption is violated if the standard deviation of one group is more than double the standard deviation of another group. In addition, your plot from the previous section may help!
I have also added code to compare the standard deviations of the log-transformed error values. Are these assumptions likely to be met for the log-transformed values?
In step 1 of the previous set of exercises, you should have found that the assumption of equal variance within each group was not met. As is often the case with skewed data, a log-transformation can help meet assumptions of normality and equal variance (we saw this in last week’s lab too). If you have 0’s in the data, then the log-transformation will be problematic since log(0) is undefined. Sometimes people will add a small number to all of the data or try a sqrt transformation. Or, better yet, one can use distributions other than a Normal distribution for inference (but, that is for another, more advanced class). For today’s lab, you should work with log-transformed values - i.e., use log(Error) rather than Error in the exercises below.
R will create the entire ANOVA table for you. Conducting an Analysis of Variance is no different from fitting a model with a separate mean for each group (tests using the F-distribution also require large sample sizes or normality and an assumption of equal variability within each group). We can construct ANOVA tables using lm() followed by anova(). Eventually we will want to be able to build models with both continuous and categorical predictors (e.g., we might consider a model that includes both DOP and CoverType). Thus, for today’s lab we will use the lm function in R, which stands for Linear Model. This will help get you started with building more complex models next week ;).
State the null and alternative hypotheses for the ANOVA test.
Use lm to fit the model and assign the result to a name, say: anova.fit<-lm(log(Error)~CoverType, data=CollarTest). To produce the ANOVA table, type: anova(anova.fit). What is the p-value? What does this result tell you about location errors in the different habitat types?
You should end up rejecting the null hypothesis, concluding that at least one of the means differs from the others. Usually, we would like to follow-up by asking, `which means differ?’ In addition, we may want to calculate confidence intervals for specific means (e.g., one of the specific cover types in the above example). In such cases, we can use the normal-based proceedures from Section 6, except:
We estimate all of the \(\sigma_i\)’s with \(\sqrt{\mbox{MSE}}\) instead of \(s_i\)
We use the error df for any t-values
Example formulas:
CI for \(\mu_i: \bar{x}_i \pm t \sqrt{\frac{\mbox{MSE}}{n_i}}\)
CI for \(\mu_i - \mu_j: (\bar{x}_i-\bar{x}_j) \pm t \sqrt{\mbox{MSE}\left(\frac{1}{n_i}+\frac{1}{n_j}\right)}\)
Test for \(\mu_i = \mu_j\) versus \(\mu_i \ne \mu_j: t = \frac{(\bar{x}_i-\bar{x}_j)}{\sqrt{\mbox{MSE}\left(\frac{1}{n_i}+\frac{1}{n_j}\right)}}\)
In each case, \(s\) or \(s_i\) has been replaced with \(\sqrt{\mbox{MSE}}\).
Multiple Comparisons
One of the main complications involved with digging deeper is that there are often many possible comparisons that could be made. With 4 categories - as in the current example - there are ``4 choose 2’’ = 6 possible comparisons. Remember the lecture on hypothesis tests and the problems we discussed with conducting lots of tests? The same principle holds true here. If there is an \(\alpha\) chance of making a type I erorr (rejecting the null hypothesis when it is true) for each test, the overall Type I error rate can be much higher than \(\alpha\).
There are a variety of techniques that have been developed to help control the probability of making a type I error. In general, it is a good idea to:
Only do pairwise tests if the overall ANOVA indicates some difference exists.
Only do a few important comparisons.
Consider using a smaller \(\alpha\) for each pairwise test.
The emmeans package allows us to estimate means and confidence intervals easily, and also conduct pairwise comparisons. In the latter case, we can also adjust for the fact that we are making multiple comparisons, thus lowering the overall Type I error.
The following exercise will have you explore these issues.
Calculate the means of the log location errors within each cover type: use mean(log(Error)~CoverType, data=CollarTest).
Use emmeans(anova.fit, list(pairwise~CoverType), adjust="none") to compute all pairwise differences. Use the p-values associated with these pairwise tests to summarize the pairwise comparisons (as in the lab intro slides).
Use emmeans(anova.fit, list(pairwise~CoverType), adjust="tukey") to compute all pairwise differences. Use the p-values associated with these pairwise tests to summarize the pairwise comparisons (as in the lab intro slides).
What effect did adjust="tukey" have on the results?
When the conditions for ANOVA and the F-distribution are not satisfied, you can always do a randomization test using shuffle. If we permute (i.e., shuffle) the group labels, then we are testing whether the distributions of the Error values are identical for each group. This test makes most sense in experimental settings, i.e., where the groups are randomly assigned. For observational data, however, it is important to note that the test could be rejected because the means differ or because the variances differ; we are actually testing whether the distributions are the same (Johnson 1995)! Alternatively, we could conduct the test by first making the means equal, then bootstrapping (as we did with the temperature data example earlier in class). For observational data, this approach makes the most sense.
More importantly, Null hypothesis tests of this sort are often not very interesting. We rarely care if the means differ. More than likely they do at some level - we just may not have enough data to reject the Null hypothesis. What is more important, is how much they differ (Johnson 1999). This latter question can be addressed using the bootstrap.
An advantage of bootstrapping is that we can calculate means and confidence intervals on the original scale (i.e., using Error rather than log(Error)). An easy way to compute bootstrap confidence intervals is to use the mean function, followed by confint, shown below:
We can also get confidence intervals for differences in means. For example, we can create a variable containg the differences we are interested in and then use the qdata function (below, for a 95% confidence interval):
Create a bootstrap distribution of mean values for each group using the example code (above). Calculate a 90% confidence interval for the mean Error value in each cover type. To figure out how to alter the confidence level, type ?confint.
Calculate a 90% confidence interval for the difference in mean Error values for EarlyBrush - HeavyBrush by modifying the code using the qdata function, above.
Literature Cited
Frair, J. L., J. Fieberg, M. Hebblewhite, F. Cagnacci, N. DeCesare, and L Pedrotti. 2010. Resolving issues of imprecise and habitat-biased locations in ecological analyses using GPS telemetry data. Philosophical Transactions of the Royal Society, Series B 365:2187-2200.
Johnson, Douglas H. 1995. Statistical sirens: the allure of nonparametrics. Ecology 76: 1998-2000.
Johnson, Douglas H. 1999. The insignificance of statistical significance testing. The journal of wildlife management 63:763-772.
Tinbergen, Niko, et al. 1962. Egg shell removal by the black-headed gull, Larus ridibundus L.; a behaviour component of camouflage. Behaviour 19: 74-116.
A first draft of this lab was adapted from a lab created by Dr. Kari Lock-Morgan (which I can no longer find or access). In addition to changing much of the text, I have used a different data set and modified the coding exercises.