Let me shoot straight out of the box: There is a high profile research paper that claims to provide evidence for the existence of psychic phenomena. I am not buying into it!
In a previous blog post I expressed my reservations towards the exciting study, by carelessly holding up to ridicule what I continue to perceive as shortcomings of the cited research. It was an attempt at satire in the wikipedia sense that
"although satire is usually meant to be funny, its greater purpose is constructive social criticism".
Given a flurry of hostile comments in response to this aforementioned post, I have come to terms with the fact that I will likely never make a living as a skilled satirist, but given my ambitions in the field of science, I'd be damned if I didn't make another attempt of communicating my constructive criticism.
One particular difficulty with criticizing a research paper that has been accepted into a premier journal is of course that it necessarily takes place at a somewhat technical level; and given the general audience for this blog I feel justified in having avoided such a technical discussion, and having instead opted for the "easy" (but apparently not as easy as I thought!) satire. At least it seemed justifiable when I wrote the previous post, but today I am really just thinking, "Write it up anyway, and whoever wants to read it can read it." So here we go:
As I wrote above, my trouble with Prof. Bem's research article is with how the results are being interpreted. This is very different from doubting the veracity of the results themselves, and this is something I do not do. I accept that the numbers are what they are.
Although it might be useful for you to read the original research paper (you can do so by downloading it here), or Melissa Burkleys summary (here), I think you'll be able to follow my arguments even without exact knowledge of the original research. Hopefully you will also be able to follow my arguments without knowing anything about statistics, since what I am fundamentally appealing to is some common sense (and maybe a little algebra):
Now, the main purpose of the 9 experiments reported in Prof. Bems paper is to measure backward causality; an effect by which an "unforeseeable" event in the future has a causal influence on something that happens in the present. The point of such measurements would be to prove the existence of a force that somewhat paradoxically allows people to foresee the unforeseeable.
The experimental set-up features the admittedly ingenious idea of reversing the order of well-established psychological experiments, and attempting to show that effects, which are typically understood as causal effects in standard research, also exist in Bems reversed design. I.e. some of the psychological processes we feel so familiar with, can supposedly follow causally from events that occur only AFTER the processes have already taken place. This is Bems idea in a nutshell, and as an avid comic book reader I cannot help but draw the comparison to Sidermans ability to sense danger before it arrives; only with the difference that the fictional characters superpowers are based on his acute awareness of his surroundings, while the superpowers investigated in Bems research are explicitly removed from any possibly predicting stimuli. Spidey has superpowers, Bems participants are asked to show paranormal powers...
Lets take this as an example for one of Bems studies: Suppose you are to select to open a hidden image on your computer screen. Your choice is between two icons; one on the left, one on the right. While you are thinking about your choice, your computer randomly flashes a sexually explicit picture onto the left half of your computer screen (really fast, so you don't really notice). The prediction from psychological theory is that the so called priming with the sexual image will make you more likely to chose the left icon when asked to choose. This is something that psychologists have seen many times, and for which we possess rich theoretical explanations (from multiple disciplines).
The beauty of Bems design is to run the same study in reverse mode : Your computer does not flash any picture, but you simply choose one of the icons on your computer screen. Then, AFTER you have made your choice, the computer randomly flashes a sexually explicit picture onto the screen. The magic of reverse causality is then that -according to Bem- randomness will favor the spot of your chosen icon for where the computer will flash the erotic image. It is priming with a sexually explicit picture, only in reverse.
So the experimental design is as clever as it is simple: Just let people choose an icon, see where the priming stimulus appears and see whether the two appear together more often than would be expected by chance. This is a simple paradigm, and yet it is precisely where the problems creep in. Problems that I will now do my best to explain, in as plain a language as possible:
Suppose that your choice and the computers random assignment of the sexually explicit picture were totally independent from each other. In particular, assume that there were no backward causality as claimed by Bem: In this case you would expect your choices and the computers random assignments to overlap roughly 50% of the time.
However it is important to understand that although you expect the average overlap to be precisely 50%, your expectation of actually measuring an overlap of 50% is going to be zero for all practical purposes.
Right now, you might be asking, "Wait a minute, what is this guy saying?", but actually what I am saying is something rather elementary: if you run an experiment many many times, and record all your measurements as well as is humanly possible, the likelihood that the mean of your measurements moves closer and closer to the true mean value increases; however at the same time the likelihood of observing a mean that is exactly equal to the true mean becomes smaller and smaller (zero essentially!).
Actually, you can think this through in terms of a simple coin toss experiment in which you throw a coin into the air 4 times: Each time you throw the coin it must come down either heads or tails. So if you draw out all the ways the coin may land (HHHH, HTHH, HTTH, etc.) you will find that there are 2^4 = 16 possible coin sequences that could occur from this experiment. Each of these outcomes is equally probable, since the event is considered entirely random. Yet, if you look closer at these possible outcomes you may notice that only 6 of them actually provide the expected equal number of heads and tails. This means that in only 6 of the 16 cases (i.e. 37.5%) you will find the exact 50% ratio you would expect on average.
This is with 4 tosses, and if you make it 6 you're already down to 31.25%. Throw the coin often enough to have any chance of ever being published in a scientific journal (let's say 1000 times?) and you're likely to be down to a probability of virtually 0 of exactly hitting the 50% mark.
This alone is of course not a problem, but it is a reason for why we have to perform statistical significance testing in the first place, since accepting that we're very unlikely to ever record the true mean for our experiment, implies that we need to find a way to derive an assessment of what the true value might be from what we see in our measurements...not an easy task.
A way that a scientific researcher will typically go about this problem, is to tackle it from the opposite angle, by essentially asking: Which values may we exclude as possible true values, given our measurements?
And the best way I currently see of approaching this question for Bems research within the limits of a blog post may be to walk you through an example that is extremely close to the Bem research design, but uses the already introduced coin toss experiment. Here is how it goes: