[Added March 20, 2012. I am quite pleased by the thousands of visitors this pair of blog posts on Kirsch has attracted. But I am struck from the comments just how unwilling psychologists are to suspend their tribal loyalties and preconceived notions and consider whether, when evaluated by the same standards as antidepressants, the performance of psychotherapy is just as disappointing. I see the application of evidence-based appraisal being done as a form of advocacy, selectively applying standards and selectively selecting data, as if lawyers were trying to prove their case, rather than judges carefully weighing the evidence. Sad, sad reflection on the field]
Following her interview with psychologist Irving Kirsch,CBS News correspondent Leslie Stahl admitted: "I walked away really confused." Undoubtedly, lots of the viewing audience and even professionals were confused by what they had watched.
In well-phrased soundbites, psychologist Irving Kirsch repeated his familiar claims that his research challenged the very effectiveness of antidepressants.
Irving Kirsch: The difference between the effect of a placebo and the effect of an antidepressant is minimal for most people.
Lesley Stahl: So you're saying if they took a sugar pill, they'd have the same effect?
Irving Kirsch: They'd have almost as large an effect and whatever difference there would be would be clinically insignificant.
Are the 17 million Americans taking antidepressants to relieve their mood disorders misguided? The inept response from the American Psychiatric Association in the following days was not particularly reassuring, unless one accepts the authority of psychiatrists without question.
"...not just wrong, but irresponsible and dangerous reporting," said the President of the American Psychiatric Association John Oldham, M.D.
If the effects of antidepressants are "clinically insignificant," what does Kirsch propose in their place? In a recent article he recommended alternative treatments be exhausted, presumably psychotherapy, before prescribing medication to depressed persons. And from his recent book:
Psychotherapy works for the treatment of depression, and the benefits are substantial. In head-to-head comparisons, in which the short-term effects of psychotherapy and antidepressants are pitted against each other, psychotherapy works as well as medication. This is true regardless of how depressed the person is to begin with.
Only as well? Does that mean that psychotherapy is no better than sugar pills? How could the effects of antidepressants be "clinically insignificant," while the effects of psychotherapy are "substantial" when they are just as good as medication? What if we applied his criterion evenhandedly to both antidepressants and psychotherapy for depression?
Clearly, previous estimates of the efficacy of antidepressants versus pill placebos based on meta-analyses of published clinical trials have been exaggerated. That was shown by former FDA employee and psychiatrist Erick Turner and colleagues who extracted data for 12 antidepressants approved for antidepressants approved by the FDA between 1987 and 2004 starting with fluoxetine.
In a 2008 New England Journal of Medicine article, they compared FDA's regulatory decisions to what was reported in the literature. According to FDA analyses, only half of the trials were positive, but according to the published journal articles, almost all of the trials were positive. In almost a dozen cases, trials that were negative or questionable according to the FDA were published as if they were positive. Once unpublished reports were taken into account, the overall effect size (ES) for antidepressants relative to placebo was reduced from 0.41 to 0.31, where ES is the standardized mean difference in improvement. However, in all cases, antidepressants were significantly superior to pill placebos. So, Turner and colleagues found that antidepressants were not they are cracked up to be, but they would not go so far as to dismiss them as ineffective. As Turner was later quoted, "the glass is far from full but far from empty."
Shortly afterwards, Irving Kirsch and colleagues drew upon FDA data for four of the 12 drugs examined by Turner and his colleagues—SSRIs paroxetine and fluoxetine, the 5HT antagonist and weak SSRI nefazodone, and the SNRI venlafaxine. They found essentially the same effect size of ES = 0.32 as Turner's group. However, Kirsch and co-authors applied the criterion of ES= 0.50 for clinical significance and judged the effects of antidepressants as not being clinically significant. They cited the UK National Initiative for Clinical Excellence (NICE) standard, but by the time of Kirsch's interview on 60 Minutes, the standard of ES = 0.50 had quietly been abandoned by NICE. We are left guessing why, but maybe, with its new turn to what examining actually happens in with treatment in the community, not just in clinical trials, someone recognized that an ES= 0.50 is not often attained.
I emailed a number of prominent psychiatrists and health services researchers in the UK ,asking them about the former NICE criteria of ES= .050. Simon Gilbody replied: "I never believe any study these days which finds an ES > 0.5. it usually means the science is wrong and we haven't found the bias(es) just yet."
Critics questioned Kirsch and colleagues' decision to pool the data from the particular four antidepressants because venlafaxine was widely regarded as being more effective than SSRIs. Moreover, Bristol-Myers Squibb withdrew nefazodone from the US and Canadian markets in 2004. Re-analyses of the same data found that both venlafaxine and paroxetine surpassed the NICE criterion of ES = 0.50.
In an invited editorial for BMJ, Turner commented upon the contrast between Kirsch's and his interpretation of essentially identical effect sizes. He noted that the NICE criterion of an effect size of 0.5 had been taken from Jacob Cohen's designation of it as "medium", but Cohen himself had distanced himself from his own designation of effect sizes as "small," "medium," or "large" with "The values chosen had no more reliable a basis than my own intuition." Turner suggested that Cohen would undoubtedly have rejected NICE's rigid distinction of 0.5 as a categorical cutoff between ineffective versus effective treatments. Indeed, Cohen who is now deceased, offered no hint that he would have welcomed becoming the arbiter of categorical judgments of clinical significance and famously lampooned the categorical p<.05 for deciding the world is flat rather than round.
Turner went on:
It seems unfair that pharmacological, and not psychotherapeutic, treatment has become the usual first line approach to depression merely for economic reasons. But before we embrace any treatment as first line, it is prudent to ask whether its efficacy is beyond question. For psychotherapy trials, there is no equivalent of the FDA whose records we can examine, so how can we be sure that selective publication is not occurring here as well?
It is noteworthy that the producer for the CBS 60 Minutes program, Rich Bonin, approached Turner about appearing on the program, apparently after his team had interviewed Kirsch. When Turner explained why he thought that matters were not as simple as Kirsch had presented, the CBS producer began thinking out loud with his co-producer about whether they should proceed with the story, "or is it too murky... Is it good murky, or is it just murky?" The question was resolved by leaving Turner out of the program. Better a dramatic even if inaccurate message rather than a "murky" one.
What if we applied Kirsch's standards to psychotherapy? Unfortunately with no FDA-like registry of psychotherapy trials, so we cannot be sure we have accessed all unpublished, as well as published, psychotherapy studies. We do know there's ample evidence of publication bias, even if we can't quantify it with much precision. Rosenthal's calculation of the extent of "file drawer problems" is generally rejected outside of psychology, and so we should be skeptical of estimates of how many trials need to be left in file drawers in order for estimates of the efficacy of psychotherapy to be revised.
Just as published reports of the efficacy of antidepressants can be shown to be exaggerated, so too, reports of the efficacy of psychotherapy have been exaggerated, particularly when they are largely based on the many trials conducted by developers and promoters of particular therapies. There's ample evidence of publication bias, although the exact extent of it is not known. There is also evidence of confirmatory bias in published studies, selective reporting of outcomes, and that investigator allegiance is more predictive of the outcome of a psychotherapy trial than to what a particular trial is being compared. For instance, a meta-analysis of psychotherapy for depression had to exclude a particular problem-solving therapy conducted by its originator because it is such an extreme outlier. I am sure more such examples out there await discovery.
Dutch psychologist Pim Cuijpers has shown that when you take quality of psychotherapy trials into account, estimates of effect size plummet. An overall effect size of 0.68 grows to .74 when only low-quality trials are examined, but then drops to 0.22 when only high-quality trials are considered. And Pim Cuijpers also showed that differences between particular psychotherapies and active treatment conditions are rarely above 0.20, well below Kirsch's standard of 0.50.
In one of my recent blogs, I argued that pill placebo conditions are not inert because they involve arousing positive expectations and providing considerable support to patients. We probably cannot expect sugar pills to perform well without the positive expectations and social support, but we cannot find many comparisons of psychotherapies to pill placebo conditions in the literature. My colleagues and I conducted such a clinical trial with mildly depressed patients from the community and found significant, but only modest differences between psychotherapy and pill placebo, ES = 0.34. Currently, I are working on a meta-analysis of psychotherapy versus pill placebo, but we shouldn't expect the differences to even approach Kirsch's 0.5, given the effect sizes found in the individual trials that are available for entry into the meta-analysis. Because pill placebo should be considered an active treatment condition, we should expect differences from psychotherapy to be less than when psychotherapy is compared to a waitlist or no treatment control. Consistent with this argument, Stefan Hoffmann showed that cognitive behavior therapy for anxiety disorders was superior to control conditions, but that cognitive behavior therapy and pill placebo were not significantly different. But he only had a few comparisons to work with.
The bottom line is that if we applied Kirsch's criterion to psychotherapy in an evenhanded fashion, we would have to accept that psychotherapy has only clinically insignificant differences from a sugar pill. But in the midst of the raging culture wars pitting antidepressants against psychotherapy and psychiatrists against psychologists and social workers, nobody really wants to make that claim, and certainly not Kirsch, and maybe we need to reconsider his evaluation of antidepressants.
The 60 Minutes program about antidepressants has created quite a stir, and most of my fellow psychologists have been wildly enthusiastic about Kirsch's claims, and endorsing them without giving them a needed critical look, even many who are outspoken about their commitment to being evidence-based skeptics. In the end, they are proving more loyal to their guild than to being evidence-based or evenhanded. Regardless, in the current the current culture wars, I'm not confident that many psychologists can be encouraged to take a critical look at Kirsch's claims or that they will be persuaded by evidence contrary to their position on the superiority of psychotherapy.
Antidepressants versus psychotherapy is an emotional issue around which battle lines are drawn and the troops deeply dug in. Evidence is immediately classified as either for or against, and judged before it is carefully evaluated... Erick Turner fittingly quotes George Bush employing the false dilemma fallacy: "either you're with us, or you're with the terrorists."
But we need to overcome the strong inclination to ignore the implications of an evenhanded application of Kirsch's criterion, even if we admit that that this criterion is somewhat arbitrary, if we are true to the principles of evidence-based practice. My take away message is that on average, no treatment for depression is entirely satisfactory, but individual patients may nonetheless demonstrate improvement.
In a future blog, I will take a critical look at Kirsch's other claim that the superiority of antidepressants over pill placebo emerges only with severe depression. But for now, I think it's important to point out that what we mean by "severity" is not adequately encompassed by a score on a single rating scale. Severity can also involve length of current episode and the severity and number of past episodes. If people have experienced severe depression in the past, I don't think it is necessarily advisable to them to wait until they attain some arbitrary point on a rating scale before getting attention. And as was shown by our clinical trial with mildly depressed patients referred by community physicians, both behavior therapy and antidepressants benefit some mildly depressed patients. The trick is figuring out which individual patients, and deciding what to do until we can figure them out. As is more commonly recognized in Europe than in the United States, much mild depression in the community resolves itself without intensive intervention like medication or psychotherapy. In Europe, the preferred strategy is to exhaust simpler alternatives before initiating medication or psychotherapy, and this makes particular sense if people have no previous history of depression.