A Quick Guide to the Replication Crisis In Psychology
Only one psychology finding in three can be reproduced. Is that a problem?
Posted September 6, 2015
Brian Nosek and the Open Science Collaboration:
We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. [Science article here.]
- Only 36% of the replications were "successful" (i.e., produced p values < .05).
- The average effect size of the replications was around half that of the original studies.
- Weaker, more surprising findings were less likely to replicate.
- Social psychology findings were less than half as likely to replicate as findings in cognitive psychology.
- Main effects were more likely to replicate than interaction effects.
Some of the findings that didn't replicate...
- People are more likely to cheat after they read a passage informing them that their actions are determined and thus that they don't have free will.
- People make less severe moral judgements when they've just washed their hands.
- Partnered women are more attracted to single men when they're ovulating (when the women are ovulating, that is). [See here, here.]
What's the Problem?
The replication rate is worrying low. The epidemiologist John Ioannidis estimated that around 50% of published findings are false, at least in medicine - so this result is worse than even he was expecting.
And the actual replication rate may be even lower than 36%. All the studies were taken from three top-tier psychology journals. Who knows what the success rate might be further down the totem pole, in lower-ranking journals?
Taken at face value, Nosek's study suggests that any given finding in psychology is more likely to be fiction than fact. If you're studying psychology, a lot of what you're being taught is false. And if you're teaching psychology, you're potentially devoting a big chunk of your life to spreading falsehoods. That's a problem.
"A replication rate of 36% isn't actually too bad."
Science needs to involve taking risks and pushing frontiers, so even an optimal science will generate false positives. If 36 percent of replications are getting statistically significant results, it is not at all clear what that number should be. [More]
First, the high false-positive rate isn't just due to intellectual risk taking. It's due as well to small sample sizes, p-hacking, the file-drawer phenomenon, publication bias, and - in some cases - the outright fabrication of data.
Furthermore, even if none of these things were an issue, the replication rate would still be worrying low given how rarely replications are attempted or published in psychology. A 36% replication rate might be fine if we paid more attention to rooting out false positives from the literature - but we don't, so it's not.
"It's not just psychology!"
Verdict: Agree. Some fields are doing better than psychology, no doubt - but some are doing worse. [See, e.g., here]
"It's not all of psychology. Some areas are worse than others."
Verdict: Agree. [See, e.g., here.]
I like this project but don't like the parochial way the results are communicated. First, the results tend to be presented as if they are representative of psychology--they are not. For example, psychophysics and memory are under-represented. More importantly, a lot of psychology does not use laboratory experimentation--data are collected from archival sources. In some of these areas, the important observations have been replicated hundreds of times (think of the age-crime correlation, for example). Secondly, meta-analytic techniques that have explicitly been designed to address issues of replicability and are now highly sophisticated are totally ignored.
Some relevant tweets:
[Note: "beh genetic" = "behavioural genetic." The study Brian linked to is here.]
"There's no crisis - the detection and rejection of false findings is just part of science. Psychology's doing just fine."
Lisa Feldman Barrett:
the failure to replicate is not a cause for alarm; in fact, it is a normal part of how science works... Suppose you have two well-designed, carefully run studies, A and B, that investigate the same phenomenon. They perform what appear to be identical experiments, and yet they reach opposite conclusions. Study A produces the predicted phenomenon, whereas Study B does not. We have a failure to replicate.
Does this mean that the phenomenon in question is necessarily illusory? Absolutely not. If the studies were well designed and executed, it is more likely that the phenomenon from Study A is true only under certain conditions. The scientist's job now is to figure out what those conditions are, in order to form new and better hypotheses to test. [More]
For a start, as Dorothy Bishop from the University of Oxford noted on Twitter, it “raises [the question] of how seriously to take findings that that depend so precisely on conditions.” In other words, if the results are delicate wilting flowers that only bloom under the care of certain experimenters, how relevant are they to the messy, noisy, chaotic world outside the lab? [More]
On top of that, the researchers took pains to replicate the studies as precisely as possible, which should have eliminated any meaningful context effects.
"OK, the news isnt great. But if we look at the findings through a Bayesian lens, they're a little less grim than they first appear..."
Verdict: Agree. Taking a Bayesian approach, Alex Etz described the study's outcome like this:
- Strong Replication Successes: ≈25%
- Moderate Replication Successes: ≈10%
- Inconclusive: ≈30%
- Moderate Replication Failures: ≈20%
- Strong Replication Failures: ≈20% [More]
The Last Word
To be clear, none of this implies that psychology is invalid as a science, or that it's a uniquely flawed field of inquiry. We're making progress, and this research is part of that. We’ve just got further to go than most of us thought!
Thanks to fellow blogger Robert King for stimulating discussion of these issues.
For more of the same only different, follow me on Twitter.