The replication crisis in psychology refers to concerns about the credibility of findings in psychological science. The term, which originated in the early 2010s, denotes that findings in behavioral science often cannot be replicated: Researchers do not obtain results comparable to the original, peer-reviewed study when repeating that study using similar procedures. For this reason, many scientists question the accuracy of published findings and now call for increased scrutiny of research practices in psychology.
What Led to the Replication Crisis in Psychology?
Some scientists have warned for years that certain ways of collecting, analyzing, and reporting data, often referred to as questionable research practices, make it more likely that results will appear to be statistically meaningful even though they are not. Flawed study designs and a “publication bias” that favors confirmatory results are other longtime sources of concern.
A series of replication projects in the mid-2010s amplified these worries. In one major project, fewer than half of the studies that replicators tried to recreate yielded similar results, suggesting that at least some of the original findings were false positives.
A variety of findings have come into question following replication attempts, including well-known ones suggesting that specific types of cognitive priming, physical poses, and other simple interventions could affect behavior in surprising or beneficial ways. It is important to note that psychology is not alone, however: Other fields, such as cancer research and economics, have faced similar questions about methodological rigor.
The growing awareness of how research practices can lead to false positives has coincided with extreme instances of willful misrepresentation and falsification—resulting, in some cases, in the removal or resignation of prominent scientists.
When did the replication crisis start?
The field of psychology began to reckon with reproducibility around 2010 when a particularly dubious study claimed that humans had “precognition” or the ability to predict the future. Scientists then began to discuss methodological concerns and repeat experiments to corroborate published studies. The failure to consistently replicate those findings propelled the movement forward.
What research practices have led to unreliable results?
Journals are incentivized to publish interesting and surprising findings. This leads to publication bias, the tendency to publish positive findings rather than studies that find no effect. Researchers are incentivized to publish as often as possible to advance their careers. Therefore, they may exercise flexibility in their data analysis to achieve statistical significance.
How reliable are study findings?
A landmark paper in 2015 revealed that of 97 attempts to replicate previous research findings, fewer than 40 percent were deemed successful. Another large-scale project in 2018 tested 28 findings dating from the 1970s through 2014. It found evidence for about half. An examination of 21 findings published in top-tier journals found that two-thirds replicated successfully.
Is replicability more difficult in psychology than in other sciences?
What findings from psychology have proven to be reliable?
Despite confronting challenges of reliability, even skeptical scientists still believe in a few fundamental truths about human behavior. Some of those insights are that personality traits remain fairly stable in adulthood, that individual beliefs are shaped by group beliefs, that people seek to confirm their preexisting beliefs, and more.
Understanding Research Methods
To better grasp the replication crisis, it’s worth exploring some of the statistical methods used in psychology experiments. Flexibility in research methodology can help explain why researchers unconsciously (and sometimes consciously) produce unreliable results.
What is the null hypothesis?
When conducting an experiment, a researcher develops a hypothesis. For example, they may hypothesize that spending time with friends makes people happier. They then seek to disprove the null hypothesis—the alternative explanation. In this case, the null hypothesis would be that there is no relationship between spending time with friends and happiness.
What is statistical significance?
A finding is said to be statistically significant if the results of a study are likely to generalize to the broader population of interest—essentially meaning that the finding is reliable. A benchmark of statistical significance is often a p-value of .05, meaning there is only a 5 percent chance that the null hypothesis is true.
What is a p-value?
The p-value is a measure to determine statistical significance—it’s the probability of obtaining the results if the null hypothesis were true. The smaller the p-value, the more evidence that the hypothesis is correct. The threshold for significance is generally a p-value of less than .05, although the replication crisis has led to researchers to rethink relying on p-values or use smaller p-values such as .01. The fact that .05 is an arbitrary benchmark is, for some, further evidence that p-values are given too much credence.
What are Type I and Type II errors?
A Type I error occurs when the null hypothesis is rejected even though it’s actually true, commonly called a false positive. The lower the p-value, or alpha level, the lower the likelihood of a Type 1 error. A Type II error occurs when the null hypothesis is accepted even though it’s actually false, called a false negative. The larger the power, or beta, the lower the likelihood of a Type II error.
How Psychology Is Moving Forward
The replication crisis provoked heated internal debate in the field, with some arguing that it called for an overhaul of psychological science and others maintaining that the “crisis” was unreal or overblown. Nevertheless, psychologists interested in reform have pressed ahead with efforts to make the claims of psychological research more credible.
What needs to happen next?
The reformers’ immediate aims include greater transparency in the study planning and data analysis, more routine follow-up testing of results to make sure they can be reliably observed, and study designs that are well-suited to the scientific questions at hand. It remains to be seen which approaches will ultimately be most useful in increasing the veracity of psychological findings.
Which practices can fix the replication crisis?
Psychologists have established an array of strategies to ensure that future findings have greater credibility. These include conducting replication efforts of emerging findings, relying on larger sample sizes, and leveraging thoroughly tested measures. Another is preregistration, delineating one’s hypothesis before conducting a study. Yet another is registered reports, in which journals agree to publish a study no matter the results.
How has the field changed?
In addition to specific procedures to curb unreliable research practices, many organizations devoted to credibility and transparency have sprung up in the wake of the replication crisis. A few of those initiatives include the Open Science Collaboration, the Society for the Improvement of Psychological Science, the Psychological Science Accelerator, and PsyArXiv.