Testing for ESP

Author

John Fieberg

Published

January 7, 2026

Introduction

Every year, I use an active learning exercise suggested by the authors of Unlocking the Power of data to introduce hypothesis tests. Students were asked to pair up with a partner. One of the two students “thinks” of a letter A, B, C, D, or E and then tries to telepathically communicate their choice to their partner. The partner then selects one of the 5 letters and the “sender” of the information fills out a Google form recording:

Whether they believe in ESP or not.
Whether their partner guessed the letter they selected.

We will first look at data collected from this year, then consider data from the past several years.

Load libraries

library(mosaic)
library(dplyr)
library(googledrive)
library(googlesheets4)
set.seed(03222005)

Data from 2005

Read in data

esp<-read_sheet(sheetnm)

✔ Reading from "ESP2025 (Responses)".

✔ Range 'Form Responses 1'.

esp<-esp[,-1] # get rid of timestamp
names(esp)<-c("Believe", "Letter", "Correct")
esp <- filter(esp, is.na(Correct)==FALSE)

Do you believe in ESP?

gf_percents(~Believe, data=esp)

Bar chart of believe, not believe, not sure

Do students have a preference for a letter?

gf_percents(~Letter, data=esp)

Proportion correct

gf_percents(~Correct, data=esp)

tally(~Correct, data=esp, format="proportion")

Correct
       No       Yes 
0.6388889 0.3611111

(phat<-prop(~Correct, success= "Yes", data=esp))

 prop_Yes 
0.3611111

Randomization distribution

Let’s determine what we might expect to see across repeated experiments with the same number of students if the null hypothesis is true - i.e., there is no ESP and students have a 1/5 chance of guessing the correct letter. The histogram, below, shows us what we might expect across repeated samples. The red line shows the sample proportion from today’s experiment.

nguesses<-nrow(esp) # Number of students trying to guess the correct letter
Null.dist<-do(10000)*{
    rflip(nguesses, p=1/5) # each student has a 1 in 5 chance of guessing correctly
}
head(Null.dist)

   n heads tails      prop
1 36     4    32 0.1111111
2 36     5    31 0.1388889
3 36     8    28 0.2222222
4 36     7    29 0.1944444
5 36     8    28 0.2222222
6 36     9    27 0.2500000

gf_histogram(~prop, data=Null.dist,
             binwidth=0.04, title = "Randomization Distribution") %>% 
  gf_vline(xintercept=~phat, data=NA, col="red")

Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggformula package.
  Please report the issue at
  <https://github.com/ProjectMOSAIC/ggformula/issues>.

Histogram of randomization distribution.

We can measure the strength of evidence against the null hypothesis by the p-value, or the probability of getting an observation as extreme or more extreme than the one we got, given the null hypothesis is true. Below is a “one-sided p-value” determined by the proportion of statistics (i.e., sample proportions) greater than the one we observed today.

(pvalue<-prop(~prop>=phat, data=Null.dist))

prop_TRUE 
   0.0165

We can also calculate a two-sided p-value, say if we are also interested in detecting if p < 0.20 (here, we approximate this p-value by doubling the one-sided p-value).

2*pvalue

prop_TRUE 
    0.033

Let’s also create a bootstrap distribution so we can calculate a confidence interval for p.

bootphats<-do(10000)*{tally(~Correct, data=resample(esp), format="proportion")}
(CI<-qdata(~Yes, data=bootphats, p=c(0.025, 0.975)))

     2.5%     97.5% 
0.2215278 0.5277778

gf_histogram(~Yes, data=bootphats,
             binwidth=0.04, title = "Bootstrap Distribution") %>% 
  gf_vline(xintercept=~CI[1], data=NA, col="red") %>%
  gf_vline(xintercept=~CI[2], data=NA, col="red")

Bootstrap distribution and confidence interval

It’s interesting that 0.20 is not in the confidence interval, which together with the p-value, suggest there is evidence that students may do better than random guessing. Of course, this is just one year. If we were to repeat this many times, and the null hypothesis were true, we would expect that roughly 5% of the time we would get a confidence interval that doesn’t include 0.20 and a p-value < 0.05.

Results broken down by Belief status

Correct answers (for those that do and do not believe). There are some interesting patterns here that we will come back to later (e.g., a higher proportion of correct guesses among those that believe in ESP).

gf_percents(~Correct, fill=~Believe, denom=~fill, data=esp,
            position = position_dodge())

Bar char showing percent correct by belief status.

Past years

What about past years? Let’s look at data from years 2020, 2022, 2023, 2024, and 2025.

✔ Reading from "ESP2024 (Responses)".

✔ Range 'Form Responses 1'.

✔ Reading from "ESP2023 (Responses)".

✔ Range 'Form Responses 1'.

✔ Reading from "ESP2023 (Responses)".

✔ Range 'Form Responses 1'.

✔ Reading from "ESP 2020 (Responses)".

✔ Range 'Form Responses 1'.

Proportion of correct guesses

Overall proportion correct:

tally(~Correct, format = "proportion", data=espall)

Correct
       No       Yes 
0.6871795 0.3128205

Now, broken down by year (the proportion correct fluctuates from year to year and was very high in 2024 and 2025):

tally(~Correct | year, format="proportion", data=espall)

       year
Correct      2020      2022      2023      2024      2025
    No  0.7037037 0.7567568 0.7567568 0.5483871 0.6388889
    Yes 0.2962963 0.2432432 0.2432432 0.4516129 0.3611111

Is there a favorite letter

There is no evidence to suggest that the letters chosen by students differ from a uniform distribution.

gf_percents(~Letter, data=esp)

xchisq.test(~ Letter, data = espall, p = rep(1/5, 5))


    Chi-squared test for given probabilities

data:  x
X-squared = 4.4615, df = 4, p-value = 0.3471

   33       50       38       35       39   
(39.00)  (39.00)  (39.00)  (39.00)  (39.00) 
[0.923]  [3.103]  [0.026]  [0.410]  [0.000] 
<-0.96>  < 1.76>  <-0.16>  <-0.64>  < 0.00> 
         
key:
    observed
    (expected)
    [contribution to X-squared]
    <Pearson residual>

Proportion that believe in ESP by year

Now, let’s look at the proportion that believe in ESP and how that has changed over time (again, the sample proportions have fluctuated, but the proportion of believers is much smaller than the proportion reported in the Gallop poll, which was close to 50%):

tally(~Believe | year, format="proportion", data=espall)

          year
Believe         2020      2022      2023      2024      2025
  No       0.4629630 0.6216216 0.6216216 0.3548387 0.5277778
  Not sure 0.2222222 0.2432432 0.2432432 0.3870968 0.3055556
  Yes      0.3148148 0.1351351 0.1351351 0.2580645 0.1666667

Proportion correct, broken down by belief

Interestingly, we see that believers tend to record that their partner correctly guessed the letter whereas non-believers and those that are not sure about ESP are more likely to record that their partner did not guess the correct letter.

tally(~Correct | Believe, format="proportion", data=espall)

       Believe
Correct        No  Not sure       Yes
    No  0.7029703 0.8301887 0.4634146
    Yes 0.2970297 0.1698113 0.5365854

These patterns has been incredibly consistent across the past 4 years. However, in 2020, believers recorded a much smaller proportion of correct guesses than in other years.

gf_percents(~Correct | year, fill = ~Believe, 
                        position = position_dodge(), 
                        denom = ~interaction(fill, PANEL), data= espall)

Bar graph showing percent correct by year and belief

Confidence intervals

Bootstrap confidence interval for overall proportion correct. Note, 0.2 is not in this confidence interval.

bootphats <- do(5000)*{tally(~Correct, format = "proportion", 
                             data=resample(espall))}
confint(bootphats, method = "percentile", level = 0.95)

  name     lower     upper level     method  estimate
1   No 0.6205128 0.7538462  0.95 percentile 0.6871795
2  Yes 0.2461538 0.3794872  0.95 percentile 0.3128205

Confidence intervals for proportion correct for believers:

believers <- filter(espall, Believe == "Yes")
bootphats <- do(5000)*{tally(~Correct, format = "proportion",
                             data=resample(believers))}
confint(bootphats, method = "percentile", level = 0.95)

  name     lower     upper level     method  estimate
1   No 0.3170732 0.6097561  0.95 percentile 0.4634146
2  Yes 0.3902439 0.6829268  0.95 percentile 0.5365854

Confidence intervals for proportion correct for those that do not believe in ESP”

Non_believers <- filter(espall, Believe == "No")
bootphats <- do(5000)*{tally(~Correct, format = "proportion",
                             data=resample(Non_believers))}
confint(bootphats, method = "percentile", level = 0.95)

  name     lower     upper level     method  estimate
1   No 0.6138614 0.7920792  0.95 percentile 0.7029703
2  Yes 0.2079208 0.3861386  0.95 percentile 0.2970297

Confidence intervals for proportion correct for those that are not sure what to believe:

Notsure<- filter(espall, Believe == "Not sure")
bootphats <- do(5000)*{tally(~Correct, format = "proportion",
                             data=resample(Notsure))}
confint(bootphats, method = "percentile", level = 0.95)

  name     lower     upper level     method  estimate
1   No 0.7169811 0.9245283  0.95 percentile 0.8301887
2  Yes 0.0754717 0.2830189  0.95 percentile 0.1698113

Conclusions

We appear to have pretty clear evidence, when looking across years, that the proportion of individuals that correctly guess their letter is greater than 0.2 (our expectation, given the null hypothesis). The proportion guessing correctly is also particularly high when the recorder/sender of the information is a believer. Does this mean that ESP is real, and believers can send information via telepathy? I’ve always been a strong skeptic, but the results are definitely intriguing. Could it be that believers “want” their partners to pick the correct letter and consciously or subconsciously record a correct guess in some cases where an incorrect letter was chosen? Could we design an experiment to test this more rigorously? Some things we could try to implement:

have a computer select the letter at random (to avoid the possibility of there being a favorite letter that individuals gravitate toward…though, note that we did not find evidence to suggest that students prefer certain letters!)
better separate the “sender” and “receiver” of the information to avoid any chance of visual or auditory clues
have a third / independent person record the data

After seeing the results in 2025, I reached out the authors of Unlocking the Power of Data to see if they had been collecting similar data, and if so, if they found similar results. Although they had not been collecting similar data, Kari Morgan Lock shared with me that Jessica Utts (a former president of the American Statistical Association) had done a lot of formal statistical analysis of ESP and is convinced that it is real. She summarized information from many studies that were funded by the government and suggests that:

Using the standards applied to any other area of science, it is concluded that psychic functioning has been well established. The statistical results of the studies examined are far beyond what is expected by chance. Arguments that these results could be due to methodological flaws in the experiments are soundly refuted. Effects of a magnitude similar to those found in government-sponsored reasearch at SRI and SAIC have been replicated at a number of laboratories around the world. Such consistency cannot be readily explained by claims of flaws or fraud. The magnitude of psychic functioning exhibited appears to be in the range between what social scientists call a small and a medium effect. It is thus reliable enough to be replicated in properly conducted experiments, with sufficient trials to achive the long-run statistical results needed for replicability. A number of other patterns have been found, suggestive of how to conduct more productive experiments and to produce applied psychic functioning. For instance, it does not appear that a sender is needed. Precognition, in which the relevant information is known to no one until a future time, appears to work quite well. Recent experiments suggest that, if there is a psychic sense, it works much as our other five senses do, by detecting change. Physicists are currently grappling with an understanding of time, and it may be that a psychic sense scans the future for major change, much as our eyes scan the environment for visual change or our ears allow us to respond to sudden changes in sound.