The story of how the STAR*D results were misreported has been coming together for some time now, step by step, and a paper recently published in Psychotherapy and Psychosomatics, along with a review of that paper published by Medscape Medical News on August 24, leads to an inescapable conclusion: This is a story of a scientific scandal, one that the public needs to know about.
The STAR*D trial, which was funded by the NIMH at a cost of $35 million and took six years to conduct, was touted as as the "largest antidepressant effectiveness trial ever conducted." As it was designed to study treatment strategies for helping people recover and then stay well, with a one-year followup, it would produce results, the investigators announced at the start of the trial, that would have "substantial public health and scientific significance." As the public well knows now, pharmaceutical funding of antidepressant trials produced scientific literature that was biased and profoundly misleading, a tale of persistent scientific misconduct that has now been reviewed by many authors. But STAR*D was a publicly-funded trial, and of course we would hope and expect that the results would be honestly reported.
So, with the new paper authored by Edmund Pigott, Allan Leventhal, Gregory Alter, and John Boren as a guide, let's go through the scientific sins. The results consisted primarily of two data sets, the percentage of patients whose depression remitted, and then the percentage of remitted patients who stayed well during the one-year followup, and thus we can review whether the NIMH and the STAR*D investigators accurately reported those results, and also disclosed the relevant data.
A. The percentage of patients whose depression fully remitted
What was reported
The STAR*D trial was designed to test whether a multistep, flexible use of medications could produce remission in a high percentage of depressed outpatients. Those who didn't get better with three months of initial treatment with an SSRI (citalopram) then entered a second stage of treatment, in which they were either put on a different antidepressant or given a second drug to augment an antidepressant. Those who failed to remit in step two could go on to a step three, and so on; in total, there were four treatment steps.
In a November 1, 2006 press release the NIMH announced the positive news. "Over the course of all four levels, about 70% of those who did not withdraw from the study became symptom free."
In an article published at the same time in the American Journal of Psychiatry, the researchers -- in the abstract of the article -- told a similar story. "The overall cumulative remission rates was 67%," they wrote. In the text of the article, they did note that this was a "theoretical" remission rate, as "it assumes that those who exited the study would have had the same remission rates as those who stayed in the protocol."
Still, the 67% figure was the bottom-line message being communicated to physicians and the public, and in a paper published in 2007, titled "The STAR*D Project Results: A Comprehensive Review of Findings," the researchers emphasized this bottom line: "With all steps included, almost 70% of participants who remained in the study experienced remission. Patients and clinicians are encouraged not to give up."
The actual results
Now the investigators did publish charts with data on the number of patients who stayed in the trial and actually remitted, and after I plowed through those charts, I calculated that 1854 of the 3671 patients (50.5%) who entered in the trial remitted at some point during these four steps of treatment. (I wrote about this in an earlier blog.) However, as Pigott and his collaborators make clear in their paper, even this percentage, from a scientific standpoint, is an inflated number.
When the investigators designed the study, they stated that the Hamilton Rating Scale for Depression (HRSD) would be the primary tool used to measure depressive symptoms, and that all patients, in order to be eligible for analysis, would have to have an entry HRSD score ≥ 14. Yet, their reported results strayed from those study parameters in two ways, both of which inflated remission rates.
First, during the trial, the researchers also used the Self-Reported Quick Inventory of Depressive Symptoms (QIDS-SR) to periodically assess depressive symptoms. Higher remission rates were found with this assessment tool than with the HRSD scale, and the researchers then highlighted this higher remission rate in their published articles. Using the more lenient QIDS-SR scale, Pigott and his collaborators found, added more than 200 patients to the remitted group. (In essence, highlighting the QIDS-SR remission rates is a form of post-trial cherry-picking of data.)
Second, the investigators enrolled 607 patients into the study who had a baseline HRSD score ≤14 (and thus were only mildly depressed), and, in several published reports, they included these patients when announcing a "67% cumulative remission rate," even though -- based on study criteria -- they were "ineligible" to be included in the analysis. Naturally, these mildly depressed patients were more likely to remit than those with higher baseline HRSD scores, and so including them in the published studies inflated the remission numbers.
In their paper, Pigott and his collaborators determined that there were 3,110 patients who began the study with an HRSD score of ≥ 14, and found that 1,192 of this group remitted during the study, based on a HRSD score of ≤ 7. Thus, if the study protocol had been followed and the results honestly reported, the researchers would have announced that 38% of the patients remitted during the four steps of treatment, and that the remaining 62% either dropped out or failed to remit.
b) The percentage of remitted patients who stayed well throughout a year of "continuing care."
What was reported
When the STAR*D investigators designed the study, they sought to maximize the stay-well rate during a one-year period of "continuing care." During this stage of the study, physicians could change the patients' medications, alter dosages, and add new medications. Patients were paid $25 each time they had their symptoms assessed, as it was thought this would help keep patients in the study.
In their reports, the STAR*D investigators announced that 33.5% to 50% of the remitted patients relapsed during this period of continuing care, with the lower percentage for those who remitted in stage one of treatment and the higher percentage for those who remitted in stage four of treatment. Thus, it seemed that a majority of the remitted patients had stayed well, which was fairly encouraging. And if you did a rough back-of-the envelope calculation -- multiplying the percentage of patients who remitted times the percentage stay-well rates during the followup -- it appeared that the 12-month stay-well rate for all of the patients who had entered the trial was around 40%.
The actual results
When I was researching and writing Anatomy of an Epidemic, I did my best to figure out the precise number of patients who remitted and stayed well throughout the trial. In particular, I puzzled over "figure 3" on page 1913 of a 2006 article on long-term outcomes, as it appeared the numbers might be there, but the data was presented in such a confusing manner I gave up. Ultimately, all I could determine was that of the 3,671 patients in the trial (including the 607 who had baseline HRSD score ≤ 14,) 737 had remitted and then reported, at some point during the 12-month followup, they were still well. The remaining 80% of the patients had either never remitted, relapsed during the followup, or dropped out at some point.
This was not such an encouraging number, but, it turned out, my calculations were once again too kind. In a 2009 paper, Allan Leventhal and David Antonuccio were able to make sense of that mysterious graphic on page 1319, and they reported that only 108 patients -- out of the initial cohort of 3,671 -- had a "sustained remission." In other words, only 3% of the patients who entered the trial remitted, and then stayed well and in the trial during the year-long followup.
But, as Pigott and his collaborators explain, even this number may be a bit high. They noted that many of the 108 stay-well patients may have come from the group of 607 patients who had a baseline HRSD score ≤14, and shouldn't have been included in the analysis in the first place. Moreover, since relapse was defined as a HRSD score ≥ 14, it was possible that some of the 108 patients actually had a higher HRSD score during the followup period (say a score of 13) than they did at baseline (say a score of 12), and yet would still have been reported as having remitted and stayed well throughout the 12-month period.
Why This Is A Scandal
This is my fourth post on the STAR*D results, and thus it may seem I am a bit obsessed about the study. And in a sense, I am. This was a publicly funded study, and the bottom-line message conveyed to the doctors and to the public was that it had shown that antidepressants enabled 67% of depressed outpatients to recover. That's what The New Yorker reported in an article published on March 1 of this year, adding that this "effectiveness rate" was "far better than the rate achieved by a placebo." Now let's sum up the scientific sins used to create that false impression:
• The STAR*D investigators reported a "cumulative" remission rate of 67% in the abstract of an article, when in fact this was simply a "theoretical" rate.
• They reported remission rates based on the QIDS-SR scale, even though the pre-specified primary outcome scale was the HRSD, and this switch inflated the remission numbers.
• They included remission numbers for patients who weren't depressed enough at baseline to meet study criteria, and thus weren't eligible for analysis.
• They reported that 33.3% to 50% of remitted patients relapsed during the 12-year followup, which suggested -- when combined with the inflated 67% remitted rate -- that perhaps 40% of all patients who entered the trial had recovered and stayed well, when in fact only 3% of the entering patients had a "sustained remission" (and stayed in the trial.)
As Medscape Medical News noted, the real results "point to a lack of long-term efficacy for antidepressants." But the fake results pointed to medications that were "far more effective" than placebo.
A STAR*D Investigator Responds
In her article, Medscape Medical News writer Deborah Brauser asked STAR*D investigator Maurizio Fava, who is a prominent psychiatrist from Massachusets General Hospital, whether the published analysis by Pigott and his collaborators was correct. "I think their analysis is reasonable and not incompatible with what we had reported," he said.
His answer is revealing for two reasons. First, he is acknowledging that the low remission and stay-well rates reported by the Pigott group are accurate. Those are indeed the real results. Second, he is acknowledging that the STAR*D investigators knew this all along, and that, in fact, this information was in their published reports. And in a sense, that is true. If you dug through all of the published articles, and spent weeks and months reading the text carefully and intently studying all the data charts, then maybe, at long last, you could -- like Pigott's group -- ferret out the real results.
But that is not the way that honest science is supposed to work.