I appear to be in a tiny minority of psychological researchers who believe failures to replicate influential, classic studies are just as important to publish as the originals. Studies typically do not become highly cited classics because they receive a reception of "hmm, that's interesting, let's see if it really holds up." They become classics because they tell stories that many other researchers want to believe (they confirm theoretical or political perspectives that researchers hold dear).
Researchers love to point out the inherent ambiguity of failed replications, and they are right. Replications may fail for many reasons, including:
- The original finding is false.
- The original finding is true, but the failure was just a random blip.
- The failure to replicate differed from the original in ways crucial to getting the effect. (Although this point is more important than it seems, because it suggests that the original finding is likely to be less powerful, less pervasive, less common than often first believed). These could be anything, including but not restricted to the participants, methods, or environment.
Far less frequently, however, researchers fail to point out the inherent ambiguity in the original "successful" studies finding significant effects. False "successes" can occur because:
- The study was conducted with complete integrity; nonetheless the finding is false simply because the apparent success was just a random blip.
- The researchers used techniques that have little or no parallel in the rest of life, so, even though the finding may be real under the arcane and artificial conditions created in the lab, the finding will be generally irreplicable.
- In a nonexperimental study, the researchers just happened to stumble on circumstances that produce their obtained effect even though such circumstances are relatively unusual most of the rest of the time.
- The researchers, intentionally or not, rigged the methods of the study to confirm their hypotheses.
- Experimenter effects occurred whereby the researchers, intentionally or not, biased the responses of participants to confirm the hypotheses.
- The researchers used Questionable Research Practices which produced the appearance of hypothesis-confirmation (such practices have been used to "demonstrate" manifestly false findings).
Furthermore, many studies in social psychology and other fields, some quite influential and classic have proven irreplicable (see references below). Look for my next blog post which will be a simple list of irreplicable or difficult to replicate studies in social psychology.
Many journals in social psychology explicitly disdain publishing attempted replications on the grounds that "there is no new contribution." Those grounds are severely misguided. If the replication succeeds, the new contribution is the confirmation that the original study was, in fact, actually correct and its conclusions likely to be justified. Going from the first study—"Interesting, I wonder if that result holds up and is really believable"—to the replication—"Wow, that is really true" seems just as large a marginal increase in knowledge as going from nothing to the first study.
If the replication fails, the conclusion is that the original study, no matter how innovative, creative, or "important" cannot necessary be believed or taken at face value, and additional research is needed to identify whether and under what conditions (if any) the effect appears. That is a ton
of new information and warrants publication in the same journal that published the original article. In addition to journal policies against publishing replications and reviewer disdain for replications as being unoriginal, failed replications are also often very difficult to publish because they are (dysfunctionally, in my opinion) sent to the original researchers for review—and, of course, those researchers have a vested interest in making sure the failed replication never sees the light of day.
The journals have exactly the wrong policies. Instead, replications of previously unreplicated studies should have higher than usual priority and failures to replicate should almost never be sent to the authors of the original study. Instead, they should be sent for review to researchers with no conflict of interest.
It is quite reasonable for laypeople and experts alike to be skeptical of one or two reports, no matter how "scientific" they are, and no matter how prestigious is the journal in which they are reported. Typically, a single demonstration of almost anything—a baseball player getting a hit, a broker recommmending a good stock, a restaurant providing a mediocre appetizer—is not taken as strong evidence for any conclusion (in these examples, that the player is a great hitter, the broker will make you tons of money, or the restaurant is awful) in almost any other walk of life. Scientific "results" should be treated identically—not believed, or believed very tentatively, pending consistent demonstrations that such results actually hold up.
I would go further: No one should ever take seriously the conclusions reached in ANY study (including mine!) unless and until they have been replicated, preferably more than once, by someone other than the original researchers, their collaborators, or current or former students.
Bartlett, T. (2012). Is psychology about to come undone? Chronicle of Higher Education. Retrieved on 8/7/12 from: http://chronicle.com/blogs/percolator/is-psychology-about-to-come-undone/29045
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2, 696-701.
John, L.K., Loewenstein, G., Prelec, D. (In press). Measuring the prevalence of questionable research practices with incentives for truth-telling. Psychological Science.
Jussim, L. (2012). Social perception and social reality: Why accuracy dominates bias and self-fulfilling prophecy. NY: Oxford University Press. (Abstracts, first chapter, and excerpts available at: http://www.rci.rutgers.edu/~jussim/)
Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359-1366.