library(mosaic)
library(dplyr)
library(googledrive)
library(googlesheets4)
set.seed(03222005)Testing for ESP
Introduction
Every year, I use an active learning exercise suggested by the authors of Unlocking the Power of data to introduce hypothesis tests. Students were asked to pair up with a partner. One of the two students “thinks” of a letter A, B, C, D, or E and then tries to telepathically communicate their choice to their partner. The partner then selects one of the 5 letters and the “sender” of the information fills out a Google form recording:
- Whether they believe in ESP or not.
- Whether their partner guessed the letter they selected.
We will first look at data collected from this year, then consider data from the past several years.
Load libraries
Data from 2005
Read in data
esp<-read_sheet(sheetnm)✔ Reading from "ESP2025 (Responses)".
✔ Range 'Form Responses 1'.
esp<-esp[,-1] # get rid of timestamp
names(esp)<-c("Believe", "Letter", "Correct")
esp <- filter(esp, is.na(Correct)==FALSE)Do you believe in ESP?
gf_percents(~Believe, data=esp)Do students have a preference for a letter?
gf_percents(~Letter, data=esp) Proportion correct
gf_percents(~Correct, data=esp) tally(~Correct, data=esp, format="proportion")Correct
No Yes
0.6388889 0.3611111
(phat<-prop(~Correct, success= "Yes", data=esp)) prop_Yes
0.3611111
Randomization distribution
Let’s determine what we might expect to see across repeated experiments with the same number of students if the null hypothesis is true - i.e., there is no ESP and students have a 1/5 chance of guessing the correct letter. The histogram, below, shows us what we might expect across repeated samples. The red line shows the sample proportion from today’s experiment.
nguesses<-nrow(esp) # Number of students trying to guess the correct letter
Null.dist<-do(10000)*{
rflip(nguesses, p=1/5) # each student has a 1 in 5 chance of guessing correctly
}
head(Null.dist) n heads tails prop
1 36 4 32 0.1111111
2 36 5 31 0.1388889
3 36 8 28 0.2222222
4 36 7 29 0.1944444
5 36 8 28 0.2222222
6 36 9 27 0.2500000
gf_histogram(~prop, data=Null.dist,
binwidth=0.04, title = "Randomization Distribution") %>%
gf_vline(xintercept=~phat, data=NA, col="red")Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggformula package.
Please report the issue at
<https://github.com/ProjectMOSAIC/ggformula/issues>.
We can measure the strength of evidence against the null hypothesis by the p-value, or the probability of getting an observation as extreme or more extreme than the one we got, given the null hypothesis is true. Below is a “one-sided p-value” determined by the proportion of statistics (i.e., sample proportions) greater than the one we observed today.
(pvalue<-prop(~prop>=phat, data=Null.dist)) prop_TRUE
0.0165
We can also calculate a two-sided p-value, say if we are also interested in detecting if p < 0.20 (here, we approximate this p-value by doubling the one-sided p-value).
2*pvalue prop_TRUE
0.033
Let’s also create a bootstrap distribution so we can calculate a confidence interval for p.
bootphats<-do(10000)*{tally(~Correct, data=resample(esp), format="proportion")}
(CI<-qdata(~Yes, data=bootphats, p=c(0.025, 0.975))) 2.5% 97.5%
0.2215278 0.5277778
gf_histogram(~Yes, data=bootphats,
binwidth=0.04, title = "Bootstrap Distribution") %>%
gf_vline(xintercept=~CI[1], data=NA, col="red") %>%
gf_vline(xintercept=~CI[2], data=NA, col="red")It’s interesting that 0.20 is not in the confidence interval, which together with the p-value, suggest there is evidence that students may do better than random guessing. Of course, this is just one year. If we were to repeat this many times, and the null hypothesis were true, we would expect that roughly 5% of the time we would get a confidence interval that doesn’t include 0.20 and a p-value < 0.05.
Results broken down by Belief status
Correct answers (for those that do and do not believe). There are some interesting patterns here that we will come back to later (e.g., a higher proportion of correct guesses among those that believe in ESP).
gf_percents(~Correct, fill=~Believe, denom=~fill, data=esp,
position = position_dodge()) Past years
What about past years? Let’s look at data from years 2020, 2022, 2023, 2024, and 2025.
✔ Reading from "ESP2024 (Responses)".
✔ Range 'Form Responses 1'.
✔ Reading from "ESP2023 (Responses)".
✔ Range 'Form Responses 1'.
✔ Reading from "ESP2023 (Responses)".
✔ Range 'Form Responses 1'.
✔ Reading from "ESP 2020 (Responses)".
✔ Range 'Form Responses 1'.
Proportion of correct guesses
Overall proportion correct:
tally(~Correct, format = "proportion", data=espall)Correct
No Yes
0.6871795 0.3128205
Now, broken down by year (the proportion correct fluctuates from year to year and was very high in 2024 and 2025):
tally(~Correct | year, format="proportion", data=espall) year
Correct 2020 2022 2023 2024 2025
No 0.7037037 0.7567568 0.7567568 0.5483871 0.6388889
Yes 0.2962963 0.2432432 0.2432432 0.4516129 0.3611111
Is there a favorite letter
There is no evidence to suggest that the letters chosen by students differ from a uniform distribution.
gf_percents(~Letter, data=esp) xchisq.test(~ Letter, data = espall, p = rep(1/5, 5))
Chi-squared test for given probabilities
data: x
X-squared = 4.4615, df = 4, p-value = 0.3471
33 50 38 35 39
(39.00) (39.00) (39.00) (39.00) (39.00)
[0.923] [3.103] [0.026] [0.410] [0.000]
<-0.96> < 1.76> <-0.16> <-0.64> < 0.00>
key:
observed
(expected)
[contribution to X-squared]
<Pearson residual>
Proportion that believe in ESP by year
Now, let’s look at the proportion that believe in ESP and how that has changed over time (again, the sample proportions have fluctuated, but the proportion of believers is much smaller than the proportion reported in the Gallop poll, which was close to 50%):
tally(~Believe | year, format="proportion", data=espall) year
Believe 2020 2022 2023 2024 2025
No 0.4629630 0.6216216 0.6216216 0.3548387 0.5277778
Not sure 0.2222222 0.2432432 0.2432432 0.3870968 0.3055556
Yes 0.3148148 0.1351351 0.1351351 0.2580645 0.1666667
Proportion correct, broken down by belief
Interestingly, we see that believers tend to record that their partner correctly guessed the letter whereas non-believers and those that are not sure about ESP are more likely to record that their partner did not guess the correct letter.
tally(~Correct | Believe, format="proportion", data=espall) Believe
Correct No Not sure Yes
No 0.7029703 0.8301887 0.4634146
Yes 0.2970297 0.1698113 0.5365854
These patterns has been incredibly consistent across the past 4 years. However, in 2020, believers recorded a much smaller proportion of correct guesses than in other years.
gf_percents(~Correct | year, fill = ~Believe,
position = position_dodge(),
denom = ~interaction(fill, PANEL), data= espall)Confidence intervals
Bootstrap confidence interval for overall proportion correct. Note, 0.2 is not in this confidence interval.
bootphats <- do(5000)*{tally(~Correct, format = "proportion",
data=resample(espall))}
confint(bootphats, method = "percentile", level = 0.95) name lower upper level method estimate
1 No 0.6205128 0.7538462 0.95 percentile 0.6871795
2 Yes 0.2461538 0.3794872 0.95 percentile 0.3128205
Confidence intervals for proportion correct for believers:
believers <- filter(espall, Believe == "Yes")
bootphats <- do(5000)*{tally(~Correct, format = "proportion",
data=resample(believers))}
confint(bootphats, method = "percentile", level = 0.95) name lower upper level method estimate
1 No 0.3170732 0.6097561 0.95 percentile 0.4634146
2 Yes 0.3902439 0.6829268 0.95 percentile 0.5365854
Confidence intervals for proportion correct for those that do not believe in ESP”
Non_believers <- filter(espall, Believe == "No")
bootphats <- do(5000)*{tally(~Correct, format = "proportion",
data=resample(Non_believers))}
confint(bootphats, method = "percentile", level = 0.95) name lower upper level method estimate
1 No 0.6138614 0.7920792 0.95 percentile 0.7029703
2 Yes 0.2079208 0.3861386 0.95 percentile 0.2970297
Confidence intervals for proportion correct for those that are not sure what to believe:
Notsure<- filter(espall, Believe == "Not sure")
bootphats <- do(5000)*{tally(~Correct, format = "proportion",
data=resample(Notsure))}
confint(bootphats, method = "percentile", level = 0.95) name lower upper level method estimate
1 No 0.7169811 0.9245283 0.95 percentile 0.8301887
2 Yes 0.0754717 0.2830189 0.95 percentile 0.1698113
Conclusions
We appear to have pretty clear evidence, when looking across years, that the proportion of individuals that correctly guess their letter is greater than 0.2 (our expectation, given the null hypothesis). The proportion guessing correctly is also particularly high when the recorder/sender of the information is a believer. Does this mean that ESP is real, and believers can send information via telepathy? I’ve always been a strong skeptic, but the results are definitely intriguing. Could it be that believers “want” their partners to pick the correct letter and consciously or subconsciously record a correct guess in some cases where an incorrect letter was chosen? Could we design an experiment to test this more rigorously? Some things we could try to implement:
have a computer select the letter at random (to avoid the possibility of there being a favorite letter that individuals gravitate toward…though, note that we did not find evidence to suggest that students prefer certain letters!)
better separate the “sender” and “receiver” of the information to avoid any chance of visual or auditory clues
have a third / independent person record the data
After seeing the results in 2025, I reached out the authors of Unlocking the Power of Data to see if they had been collecting similar data, and if so, if they found similar results. Although they had not been collecting similar data, Kari Morgan Lock shared with me that Jessica Utts (a former president of the American Statistical Association) had done a lot of formal statistical analysis of ESP and is convinced that it is real. She summarized information from many studies that were funded by the government and suggests that:
Using the standards applied to any other area of science, it is concluded that psychic functioning has been well established. The statistical results of the studies examined are far beyond what is expected by chance. Arguments that these results could be due to methodological flaws in the experiments are soundly refuted. Effects of a magnitude similar to those found in government-sponsored reasearch at SRI and SAIC have been replicated at a number of laboratories around the world. Such consistency cannot be readily explained by claims of flaws or fraud. The magnitude of psychic functioning exhibited appears to be in the range between what social scientists call a small and a medium effect. It is thus reliable enough to be replicated in properly conducted experiments, with sufficient trials to achive the long-run statistical results needed for replicability. A number of other patterns have been found, suggestive of how to conduct more productive experiments and to produce applied psychic functioning. For instance, it does not appear that a sender is needed. Precognition, in which the relevant information is known to no one until a future time, appears to work quite well. Recent experiments suggest that, if there is a psychic sense, it works much as our other five senses do, by detecting change. Physicists are currently grappling with an understanding of time, and it may be that a psychic sense scans the future for major change, much as our eyes scan the environment for visual change or our ears allow us to respond to sudden changes in sound.