Replication.

    It's one of those arcane words that psychological and psychiatric researchers use regularly, but precious few people in the general public have heard of, let alone understand. Yet if I had to compile a list of 10 terms that every educated layperson interested in psychology should know, "replication" would be high on that list. It refers to the ability of researchers, ideally independent researchers, to duplicate earlier findings. Independent replication is especially crucial, because one investigative team can keep repeating its mistakes, resulting in the erroneous appearance of a trustworthy finding.

    Replications in psychology are extremely important, especially because scores of interesting findings are flukes. Remember the famous 1968 Rosenthal and Jacobson "bloomer" study showing that artificially inducting positive expectations in schoolteachers can produce higher in IQs in their students? Lots of people, including many educated laypeople, have heard about this finding, but few know that a slew of later researchers found these results difficult to replicate. The original Rosenthal and Jacobson finding either isn't robust or (more likely) it's small in magnitude, especially in the real world where teachers have the opportunity to interact extensively with their pupils - which swamps the effects of teacher expectations. The same replicability problem applies to medicine; in a 2005 article, John Ioannides found that about a third of findings in clinical trials don't hold up in later studies.

    But there's a problem with replications: They aren't sexy. To many, replications seem like "old news." So the news media - which, after all, report on what's "new" - often ignore them. Take the extrasensory perception (ESP) literature, in which the media routinely report with great fanfare any hint of a positive research finding or even a supportive anecdote, but barely mention the over 150 years of hundreds of failed replications of purported psychic phenomena.

      Replications aren't sexy to journal editors either. Early in my career, when I was still a graduate student, I submitted (along with a co-author) an article to a major psychological journal that was essentially a replication and minor extension of earlier findings on the symptomatic differences between two overlapping childhood conditions, attention-deficit/hyperactivity disorder and oppositional defiant disorder. The reviews we received were quite positive, but the editor initially declined to publish the paper on the grounds that our study was "only" a replication of a previous finding (to give the editor his due, he was willing to be persuaded, and ended up publishing our article following a substantial revision). But in most cases, replications are probably even more important than the original finding, because so many initial findings don't hold up in later research.

    That's why an article that appeared on the front page of the New York Times on June 17th put a smile on my face. It did so not because I have any particular intellectual or personal investment in the finding - I don't - but because it marked one of the first times I can recall in which a failure to replicate a finding received almost as much media coverage as the original finding. This article, written by the able New York Times psychology reporter Ben Carey, reported that a widely ballyhooed finding - first reported by Avshalom Caspi and his colleagues in 2003 article in the prestigious journal Science - didn't hold up when 14 other studies were combined in what psychologists call a meta-analysis, which is a fancy statistical technique that allows investigators to combine multiple studies and treat them as though they were one big study. Specifically, in 2003, Caspi and his collaborators had found that a specific gene variant relevant to the neurotransmitter serotonin "interacted" with life stress in boosting risk for depression. That is, people with both the gene variant and life stress were especially depression-prone, so that the "effects" of genetic and environmental influences were multiplicative, not additive.

       To many observers, the Caspi finding was especially appealing because it dovetailed with popular - and arguably politically correct - notions of "gene-environment interaction." It became one of the most widely discussed and cited findings in modern psychology; as of this writing (June 28th, 2009), it's been cited a staggering 1996 times in the scientific literature (by way of comparison, the modal number of citations for journal articles in psychology is 0 - yes, zero), and was widely hailed as among the significant scientific findings of the decade. Yet the new meta-analysis, led by geneticist Neil Risch and published in the Journal of the American Medical Association (JAMA), showed that when the data from other studies were combined along with the original Caspi findings, the interaction effect vanished.  http://jama.ama-assn.org/cgi/content/full/301/23/2462

      Of course, it's possible that this negative verdict may itself change over time with the emergence of new findings, and that Caspi and colleagues will ultimately be vindicated. The beauty of science is that it's self-correcting in the long term, even if it's often messy in the short term. Eventually, the truth regarding serotonin genes, stressful life events, and depression will out. But with the appearance of the JAMA article, the ball is now in the court of Caspi and colleagues, not in the court of their critics, to show that their interaction effect wasn't a mirage.

      What lessons can we draw from this episode? We shouldn't put too much trust in any psychology finding unless and until a different investigative team has replicated it. We should also remember that the news media rarely appreciate the importance of replication, so they're liable to hype surprising findings before others have duplicated them. And investigators themselves should strive to contain their excitement about most findings until others have found them to be dependable. In the interests of full disclosure, I should note that I may be guilty of having violated this precept. On two occasions, I have published findings of interactions in the domains of personality and psychopathology, and to my knowledge nobody has tried to replicate them. In retrospect, I wish I had been more circumspect in reporting them, in part because (for a host of statistical reasons I won't bore readers with) interactions may be especially unlikely to replicate, and in part because I've since come to recognize how easy it is to fall in love with one's results - especially when they dovetail with one's hypotheses.

     Finally, as consumers of the psychological literature, we should remember a nugget of wisdom that my wise Ph.D. mentor, the late David Lykken, was fond of dispensing: In general, the more intriguing a psychological finding, the less likely it is to replicate. With a few exceptions, David was probably right, because the more a finding contradicts accumulated knowledge from carefully conducted research, the more likely it is to be wrong. Of course, surprising findings will occasionally turn out to be true, so in interpreting such findings we need to walk a fine line between excessive skepticism and excessive open-mindedness. But if Lykken is correct, the amount of media coverage a finding receives - which usually reflects its counterintuitiveness - may actually be inversely related to its trustworthiness. Caveat emptor.

 

You are reading

The Skeptical Psychologist

Evidence-Based Practice: The Misunderstandings Continue

A recent essay displays startling misconceptions regarding science and therapy.

Beware the Regression Fallacy

Firing Coaches May Seem to Help Sports Teams, But It May Just Be an Illusion

Coma, Dubious Science, and False Hope

Facilitated communication again misleads the media – and the public