Update August 26, 2012: This blog post has been repeatedly spammed with comments from persons associated with scientology. I guess that it is irritating to some fringe elements is a mark of its sucess.
If Irving Kirsch's soundbites on CBS 60 Minutes News seem to roll out smoothly and well-rehearsed, perhaps it is because he's been saying these things for 15 years. It is instructive to compare what he is saying now to what he wrote with Guy Saperstein in 1998 in the now defunct American Psychological Association online journal Prevention and Treatment, not only to appreciate the continuity with what he is now saying, but also to see how he deals with critics. The journal carried his target article, but also critical commentaries with responses from Kirsch. I can't do justice to all that went on in this exchange within a short blog post, so I encourage interested readers to take advantage of the links I provided here to read the articles for themselves. Moreover, if readers find themselves having trouble making sense of Kirsch's target article, they are certainly not alone, and they might want simply to jump to the exchange between Kirsch and his critics and then refer back to the article to compare with see what Kirsch actually said. This blog focuses specifically on 1998 exchange and leaves for a possible later blog discussion of another article by Kirsch in the same journal in 2002 and the exchanges that accompanied it.
In the target article Listening to Prozac but Hearing Placebo, Kirsch and Saperstein laid out their argument that approximately only a quarter of the response observed to administration of an antidepressant was due to the antidepressant itself, one quarter was due the administration of an active rather than an inert pill (the lactose tablet in the pill placebo condition), and the remaining quarter of the apparent effects were nonspecific.
In 1998 Kirsch was willing to concede that antidepressants had a "considerable benefit over placebo" but the placebo complement to the response was considerably greater than any pharmacological effects.
The article is quite dense and difficult to follow in its twists and turns in logic and sudden declarations of definitive conclusions. Kirsch and Saperstein claimed his interpretations were based on rather complicated meta-analyses, which were described in a way that it is unlikely that anyone else could replicate them. Their examination of past reviews and computer searches had yielded 1500 publications, but only 20 were retained as meeting inclusion criteria. What was important technically was that calculation of within-condition effect sizes was nonstandard, and involved calculating for measures of depression the mean posttreatment score minus the mean pretreatment score, divided by the pooled standard deviation.
Donald Klein delivered a detailed and scathing response in Listening to Meta-Analysis but Hearing Bias. He documented his claims that Kirsch's article depended on
"a miniscule group of unrepresentative, inconsistently and erroneously selected articles arbitrarily analyzed by an obscure, misleading effect size... The attempt to further segment the placebo response, by reference to psychotherapy trials incorporating waiting lists, is confounded by disparate samples, despite Kirsch and Saperstein's claim of similarity."
Again, interested readers can examine for themselves Klein's criticisms in the context of the larger exchange, but he started by questioning the reliability of coding and the representativeness of the small sample of 19 studies that Kirsch's search had yielded. Klein then went on to detail how Kirsch's measure of effect sizes depended on completer, not intent-to-treat data. However, the crux of Klein's arguments depended on a careful study-by-study critique of the appropriateness and validity of the studies on which Kirsch depended. He next turned to an examination of the studies that Kirsch depended on for estimating the natural course of on treated depression and noted that they were analog studies of college students.
Klein ended with a declaration that the publishing of the article represented a failure of peer review.
Robyn Dawes provided a very brief critique of Kirsch's article, which, although was not as detailed as Don Klein's, it was almost a scathing. It opened with a declaration that
"the simple posttreatment minus pretreatment difference in outcome variable for treatment group over placebo group does not define either a treatment effect or placebo effect, even for groups randomly constructed. Such differences must be compared with the difference obtained from a (randomly selected) no treatment group in order to evaluate the effect of treatment or placebo... Effect involves a comparative judgment, not just a pre-post one. And even when a legitimate effect is found for placebo, it often (almost always?) makes little sense to talk of a proportion of a treatment effect has been accounted for by a placebo effect. The logic of Kirsch and Saperstein (1998) is the seriously flawed. Science (like art and life) is not that easy."
Dawes didn't let up:
the unusual—to use a kind word—nature of this analysis is best illustrated in the first sentence of the discussion section... "No treatment effect sizes of effect sizes for placebo response will calculate from different sets of studies." There's no such thing as a no treatment effect size.
Dawes ends with:
If knowledge were simply attained as implied by Kirsch and Saperstein's article - by justifying effect sizes a pre-post comparison—it would not even be necessary to randomize. We would know enormous amounts more than we currently do. Unfortunately, we don't.
Kirsch's Reducing Noise and Hearing Placebo More Clearly was framed as a direct response to issues raised by commentators, but readers who expected that soon become frustrated by Kirsch's refusal to deal with his critics. In oblique response to the critics, he repeatedly relied on work by Walach and Maidof as corroboration for his analyses. Unfortunately, this meta-analyses was not available to readers at that time nor peer-reviewed. Rather, it was scheduled to appear as a chapter in a then forthcoming book edited by Kirsch. Basically, readers were being asked to accept this supposed corroboration of Kirsch's claims on the faith without being able to check for themselves.
Kirsch briefly responded to Klein's detailed critique of the individual studies Kirsch had entered into the meta-analysis, but largely just dismissed them and accused Klein of dismissing data simply because he didn't like them.
As for Dawes, Kirsch makes a few passing comments and declares
Instead of simply dismissing the data with "handwaving" about the assumption of additivity not being justified (Dawes, 1998), additivity should be assessed in balanced placebo studies of antidepressants. To my knowledge, this has never been done.
In his final response in this exchange, Klein was again highly detailed, citing specific claims in specific passages and challenged both Kirsch's misrepresentation of Klein's arguments and accused him of inconsistencies and non sequiturs.
Kirsch ended the exchange with the reiteration of his claim that 75% of the response to an antidepressant is duplicated by placebo administration. He conceded that "Klein is correct in noting that I have not responded to many of his criticisms. Some of these do not seem worthy of response..."
We have to be careful about imposing contemporary standards on work done over a decade ago, but I think that even by the standards available back then, Kirsch's analyses were highly idiosyncratic and risked introduction of biases, without the transparency that allowed readers to evaluate for themselves what was being claimed. Rating scales for assessing the likelihood of bias in meta-analyses had not yet been developed, but if one applies a widely regarded contemporary scale, it can be readily seen that Kirsch's meta-analysis was lacking in transparency and had a high likelihood of bias. It certainly would not get through peer review now, and it is hard to see that its peer review was adequate even judged by 1998 standards. In his replies, Kirsch's heavy dependence on an unpublished, non-peer-reviewed meta-analysis for corroboration of important points was symptomatic of the more basic of issues in presentation of evidence for independent review and basic credibility.
In his first response to critics, Kirsch expressed indignation that Klein would question whether that Kirsch's target article had received an adequate peer review, suggesting that this was an "affront to the work of the anonymous scholars who provided careful reviews of the original manuscript and to the judgment of the associate editor who accepted the revised manuscript for publication." I don't think skeptical sleuths should be deterred by this reaction. Taken together, the exchange demonstrates the usefulness and even the necessity of post-publication peer review. Furthermore I think serious questions can be raised about the thoroughness of the review of the original target article, whether the reviewers bothered to check Kirsch's interpretations against original papers he selected, and certainly whether there was any refereeing at all involved in the exchange with critics.
I engaged in a debate on the list serves with Irving Kirsch when his 2002 article had just come out in Prevention and Treatment. Reading this 1998 exchange reminded me of just how frustrating an experience it was to argue with an ideologue and it was validating to see that others had the same problem.