Scientific findings have started to develop a public reputation of being unreliable. Results from large clinical trials are often reversed in a later study, to the surprise of many. More often than not basic laboratory findings, especially ones that are thought to be spectacular at the outset, cannot be replicated. In the field of psychology and cognitive sciences, this problem has become especially pernicious and controversial.
To address these issues, the National Institute of Health has started to design formal funding mechanisms for replication, which has been very rare in the past.
What are your thoughts as a philosopher working at the intersection of philosophy of science and philosophy of mind?
Rosa Cao (RC): I think replication is overemphasized. Don't get me wrong, it is a kind of minimum standard, and certainly we should worry about declining effect sizes in psychology. But in the end, what matters is whether the experiments show what they purport to show. By increasing the sample size, successful replications increase the credibility of the actual data. But it won’t help us if the original setup was defective in design, and it won’t correct a misinterpretation of the data.
Replication is meant to be a reality check. Was this finding a fluke? More cynically, it can detect both intentional fraud and innocent wishful manipulation of results. If two groups with different incentives nevertheless produce the same data, then we can feel more confident that the results were not skewed by experimenter incentives, conscious or unconscious. But replication is not the only, and certainly not necessarily the best reality check. The gold standard in biology is mechanism. A clear biological mechanism that produces effects in a predictable way lends significantly more credibility to our results.
SL: Interesting observation. The replication problem has two facets: there is flawed science and then there are bad scientists.
RC: Calling for replication is interpreted as an accusation of wrongdoing, and we see people getting angry and closing ranks. Bissell, for example, argues in a Nature op-ed that it is irresponsible to call for replications, and blames failures of replication on incompetence in would-be replicators. I’m sympathetic to her frustration – someone telling you your recipe doesn’t work because they’re incompetent cooks. On the other hand, that suggests in turn that many published findings are so fragile that we can’t conclude very much from them (let alone, say, generalize from mouse results to human ones).
But sometimes results don’t pan out for innocent reasons. We know that data are always noisy, and cherry picking, misinterpretation, bad design, etc. are common. Replication per se will not to do a good job at either identifying those problems or solving them.
Sometimes the utility of replication is best illustrated by “debunking” replications: a later group finds the same data that a previous group did, but with a different set of hypotheses. That might be the most powerful kind of debunking, but strictly speaking, it is a successful replication, where we find the same data, but realize that it merits a different interpretation than the one originally given. So the issue is not replication, it's interpretation.
SL: What are some of the new developments in the philosophy of science that are relevant to our present discussion?
RC: Philosophy of science back in the good old days when physics was dominant was obsessed with laws. Now that the biological sciences are ascendant, philosophers have started to pay more attention to other ways that we have of making scientific progress. They argue that what we look for are not laws without exceptions of the kind found in physics, but rather contextualized generalizations, where we know how something works, and it does work most of the time, but not always.
Those generalizations are embodied in descriptions of the mechanism by which some phenomenon of interest is produced (see Machamer, Darden and Craver paper ). Those allow us to link the new phenomenon of interest to simpler parts and functions that are already understood, or at least partially understood in broader contexts. If we know what the parts are, and how those parts in turn work, we can not only make generalizations, but we can also estimate how good those generalizations are, and what their scope is, how far they extend beyond the original experimental context. Those are the situations we really care about. If an experiment is perfectly replicable, but only in the lab, then what good does it do medicine, or our understanding of cognition “in the wild”?
SL: Many different levels of analyses are important. Medicine is broad enough that without knowing mechanisms in detail, we can still make important and useful inferences, and given that our budget for research is limited, how to prioritize funding for studying mechanism versus systematically categorizing and analyzing phenomenology, both of which are expensive, becomes critical.
RC: I still think that replication in medicine is really a second-class substitute. When – as is so often the case in medicine - we don’t yet know how something works, we might settle for increased certainty that it works. Ideally, we want to know how a medical treatment does its job. But we’ll settle for effective treatment that often works for most people, even if we don’t know exactly how.
Perhaps you might say that in medicine, we can't afford to ignore phenomenology. If a drug seems to cure cancer, we have a responsibility to believe that it does, and get it out there, even if we don't know how it works. But not knowing how it could even possibly work, that is, a total lack of even a potential mechanism, is prima facie evidence against the result, especially if it is statistically weak.
SL: Perhaps the criticism against social psychology experiments shouldn’t be that they don’t replicate, but that they don’t focus enough on systematizing mechanisms and making underlying theories more rigorous.
RC: Or maybe in something like social psychology, it is too hard to look for mechanism. We just don't know enough about how the biological parts (cells, transmitters, etc) manage to produce complex social behaviors. But perhaps that is good reason to doubt the utility of social psychology experiments beyond the relatively narrowly delineated phenomena that they investigate directly. For almost any field, we need to remind ourselves that experimental claims often don’t generalize as widely as we would like, and are most reliable when restricted to actual phenomenon observed.
SL: The real problem I see for research in both basic, mechanistic biological science, as well as applied, human subject research, including clinical trials, is that the number of hypotheses is growing exponentially. We live in the age of –omics, when signals derived from hundreds of thousands of genes and brain regions can be measured and tested for correlations simultaneously. We make the assumption that reductionism works. For example, knowing more about the brain circuitry involved in addiction can help us alleviate alcoholism. But which of the millions of nodes in the circuit are relevant? That I see is the most challenging problem facing us today. Replication is, in some way, a relic from an era of cottage industry science, when insights could be gathered from a single hypothesis and a single experiment.
RC: It’s true that we are facing a proliferation of both hypotheses and data. So perhaps we don’t need replication in areas where we now have huge sample sizes. But those huge sample sizes go along with hypotheses that have small effect sizes – statistically significant but not useful in the end. Genomics has not yet taught us much about how diseases work, or how to treat them. It has given us an overwhelmingly complex picture of which markers are associated with which other markers. That is yet another reason to go after mechanism. By identifying a particular underlying neural pathway, or a set of molecules and receptors involved, we are a step closer to gaining insight into human behaviors.
SL: There is a response from machine learning. Complex data may not require understanding of all underlying mechanism to produce useful, predictive products. Algorithms can detect mechanisms that humans have a hard time articulating. And even if we don’t have all the mechanisms we can still emulate useful functions: the Google car being the perfect example—we know very little about how humans mechanistically drive a car. Mechanism reduces dimensionality and improves predictive performance.
To bring the discussion back to the original point, if we think about the scientific method as a Bayesian learning process, as long as the “sample size” (i.e. evidence) increases, we asymptotically approach the “right answer” (the correct mechanism). The right answer might just be a complex model trained on a large set of data rather than individual statistical hypotheses.
But if there are systematic biases (for example, publication bias), that’s no longer true. So even with replication and sophisticated modeling, a large portion of scientific findings can still become quite biased. Becoming aware of and addressing these problems is possibly the most important.
RC: Science is supposed to be self-correcting. This is another reason to move beyond replication alone. When new experiments build on earlier ones (rather than merely seeking to repeat them), we get a second check on whether the earlier results are reliable. The most widely practiced methods are the most believable. There’s this great old paper by Ian Hacking, where he says, “if you can spray them, then they are real.” .
This interview was conducted and edited from E-mails.
 Thinking about Mechanisms, Peter Machamer; Lindley Darden; Carl F. Craver, Philosophy of Science, Vol. 67, No. 1. (2000) pp. 1-25.
 Reproducibility: The risks of the replication drive. Mina Bissell, Nature 503, 333–334 (21 November 2013) doi:10.1038/503333a
 “We are completely convinced of the reality of electrons when we set out to build - and often enough succeed in building - new kinds of device that use various well-understood causal properties of electrons to interfere in other more hypothetical parts of nature" Ian Hacking (1982). Experimentation and Scientific Realism. Philosophical Topics 13 (1):71-87