Why the second example?

To highlight R’s capabilities:

Why the second example?

To illustrate how to create bootstrap and randomization distributions with paired data

  • Key is creating a variable equal to the difference between observations (e.g., “before”, “after”), or in this case (“VHF”, “GPS”)

Example using Spider Speed

library(abd)
library(dplyr)
set.seed(10004)
data(SpiderSpeed)
head(SpiderSpeed)
  speed.before speed.after
1         1.25        2.40
2         2.94        3.50
3         2.38        4.49
4         3.09        3.17
5         3.41        5.26
6         3.00        3.22

Create a variable holding differences

SpiderSpeed <- mutate(SpiderSpeed, After.minus.Before=speed.after-speed.before)
head(SpiderSpeed)
  speed.before speed.after After.minus.Before
1         1.25        2.40               1.15
2         2.94        3.50               0.56
3         2.38        4.49               2.11
4         3.09        3.17               0.08
5         3.41        5.26               1.85
6         3.00        3.22               0.22

Bootstrap CI for \(\mu_{After}-\mu_{Before}\)

Resample differences!

bootdist<- do(1000)*mean(~After.minus.Before, data=resample(SpiderSpeed))
(confspeed<-confint(bootdist))
  name     lower    upper level     method estimate
1 mean 0.7098594 1.660281  0.95 percentile 1.185625

We are 95% sure that the mean difference in running speeds after removing a pedipalp (versus before) is betweeen 0.71 and 1.66.

Randomization Distribution

Test \(Ho: \mu_{After}-\mu_{Before}=0\)

head(SpiderSpeed)
  speed.before speed.after After.minus.Before
1         1.25        2.40               1.15
2         2.94        3.50               0.56
3         2.38        4.49               2.11
4         3.09        3.17               0.08
5         3.41        5.26               1.85
6         3.00        3.22               0.22

If \(H_0\) is true, the Labels “Before” and “After” are meaningless

\(\implies\) we want to shuffle the before and after observations within the rows (to keep the data paired).

Shuffling rows is equivalent to randomly choosing the sign of the differences:

2.40 - 1.25
[1] 1.15
1.25 - 2.40
[1] -1.15

Randomization Distribution

We can create a data a set consistent with \(H_0\), by randomly choosing the sign of each difference.

rsign<-resample(c(-1,1), size=nrow(SpiderSpeed))
head(rsign*SpiderSpeed$After.minus.Before)
[1]  1.15  0.56  2.11  0.08 -1.85 -0.22

We can then calculate 1 observation (difference in mean) from the randomization distribution using:

mean(rsign*SpiderSpeed$After.minus.Before)
[1] 0.113125

And, to create randomization distribution, we just need to put all of this within { } and use the do function:

rand.dist<- do(5000)*{
  rsign<-resample(c(-1,1), size=nrow(SpiderSpeed))
  mean(rsign*SpiderSpeed$After.minus.Before)
}