In my last blog, I pointed out that many readers do not bother to read further than an abstract before forming an impression of a study that becomes relatively fixed and even cited in subsequent work. Whether this is good practice is beside the point, it happens. Hyped and inaccurate abstracts perpetuate myths and misinformation and contaminate other sources.
If we relied solely on published abstracts to form our impression of the efficacy of psychotherapy and psychosocial interventions, we would come to the conclusion that almost all interventions are effective and ready to be disseminated into the community. Yet, most interventions are not impressively better than alternatives.
There are calls for journals to ensure greater accuracy in abstracts, but being more accurate conflicts with strong pressures on authors to put their best foot forward, to accentuate the positive and outright distort their findings in order to get their papers published and cited, and protect grant funding.
In the last blog, I also provided a validated checklist for evaluating the adequacy of abstracts and in this blog I am delivering on this promise take the abstract of a particular study and compare it to its actual results.
Nina Heinrichs, Tanja Zimmermannm Brigit Huber, Peter Hershbach, Daniel Russell et al. Cancer Distress Reduction with a Couple-Based Skills Training: a Randomized Controlled Trial. Annals of Behavioral Medicine. (2012) 43:229-252.
Although I am critically examining the abstract of this particular paper, I am not assuming that it is especially misleading or inaccurate relative to what is available literature. There's ample evidence that confirmatory bias is widespread and acceptable in reporting the results of studies, with many psychologists admitting multiple looks at the data, suppression of negative findings, and highlighting of the positive. My goal is simply to raise the level of skepticism of readers about the accuracy of abstracts and to suggest the need not to depend on them as representing what actually appears in the text of articles or what was actually found in the research.
Readers who wish to make their own evaluations can stop here for now. Before proceeding, if they can download a PDF of the article from a University electronic library with a subscription to the journal. If they cannot do that, they can send an e-mail to the corresponding of the study and ask for one: email@example.com.
Please recall that the checklist does not evaluate the accuracy of information but only whether particular information is provided in an abstract. But we will use it as a guide in going further in evaluating the accuracy.
What the Abstract Says
The abstract states that this study investigated the usefulness of providing skills to couples in which the woman had been recently diagnosed with breast or gynecological cancer. The study is noteworthy in focusing on the couple, not just the woman with cancer, and it evaluates whether this focus is efficient and effective. The description of results in the abstract suggests that it is decidedly so: women assigned to the intervention, rather than to the control group—
…showed larger reductions in fear of progression, and couples reported less avoidance in dealing with cancer, more posttraumatic growth, and better relationship skills.
The abstract acknowledges that all advantages of the intervention disappeared by 16 months after diagnosis, but nonetheless concludes
Short-term changes in functioning may be improved by enhancing couples' dyadic skills during active medical treatment of the disease.
Despite the disclosure that any changes were only short lived, this sounds like a strong endorsement of a promising intervention.
Of all the items on the checklist for evaluating abstracts, I pay particular attention to three items:
(a) whether the abstract clearly identifies a primary outcome (no more than one or two) that can be used to decide whether the intervention is effective;
(b) the number of participants who are analyzed;
(c) the results for each group and the estimated effect sizes and precision; and
(d) and the general interpretation of results.
My strategy is to compare what is reported in the abstract to information available in the text of the rest of the article.
Why are these particular items important? Hyped and inaccurate reporting frequently starts with failure to identify ahead of time the study’s primary outcome. An all too common practice is to assess a number of outcomes and then report only those that make the intervention look best. And it is not enough for abstracts to simply report the change for the intervention group was greater than the change for the control group. Lack of precise details about effect sizes often hide that results that are modest and even nonsignificant. Next, results need to be based on all participants to be randomized. Failure to include all participants in analyses can create bias and defeats the purpose of having conducted a randomized trial. It is typically not random that data for some participants are missing. Finally, the general interpretation of results in the abstract is often where the lipstick is put on a pig of a study, with inaccuracies in the rest of the abstract put to use in creating a picture of a positive trial and intervention that is ready to be disseminated and put into a wide practice.
What I found delving into the text of the article...
Right away there is evidence that the abstract is not telling the whole story and may be putting a positive spin on things. The title mentions "cancer distress reduction" but the abstract does not seem to mention outcomes for this variable. Turning to the Method Section, it is clear that more outcomes were assessed than were reported in the abstract. None were singled out as most important, but all of the following were assessed-- cancer specific concerns, fear of progression, and avoidance-defense, but also posttraumatic growth, quality of marriage, communication, dyadic coping. In addition to these self-report measures, there were behavioral observation measures of nurturance and behavioral support. So, they were nine measures of outcomes, with no clear differentiation of what were primary, but the abstract indicated results five.
If you read the results section carefully and examine the numbers in the tables, you can find evidence that something went wrong with the study, but I suspect most readers will get through this and not notice. We will come back for a closer look.
But first, there are disclosures sprinkled in the Discussion that should raise concerns and be enough to send the reader scrambling back to read the results more carefully. Immediately after a glowing summary of the findings of the study is presented, the authors disclose:
"The baseline differences in cancer related stress and fear of progression limits [sic] unbiased interpretation of the data. In fact, the baseline differences and two of out of 16 possible differences may indicate failed randomization.… Therefore, we conclude that the significant differences in cancer related distress that emerged between the intervention groups are likely caused by the baseline differences and not by differential effect of the couple intervention. In contrast, the significant difference in fear of progression that emerged between the intervention groups are likely caused by differential effect of a couple intervention and not by the baseline difference.…
There is also a suggestion that the apparent improvement in communication skills in the post intervention follow-up was a result of couples with poor skills no longer being available.
Our suspicions raised, let's go back to the results section. The important things to look for are in table 3 is whether time x group interactions are reported to be significant. That is what is needed to argue that differences between the intervention and control group changed over time. The hope is to show that there is a benefit to intervention because at randomization, the groups did not differ, but then began to differ has the effects of the intervention took hold. Figure 1 shows a situation which that occurs. However a time by group interaction is not significant in itself. Figures 2 and 3 show what actually occurred in this study.
What happened is that the groups differed initially and that the differences disappeared with the passage of time. That is quite odd. How do we make sense of such results? It could be that the intervention closed the gap between the groups, or it could simply be that the higher initial cancer related distress and fear of progression dissipated over time. Regardless, the abstract does not acknowledge any problem.
The authors apparently tried to control for initial differences in some of their analyses, but did not say why they thought this was needed. One would expect that withrandomization, there is no need for control for initial differences because there are no such differences. And when there are differences, trying to control for them can produce misleading results. Consider an absurd experiment comparing performance of college graduates to high school dropouts on a test of intellectual ability. It would not accomplish anything to control for amount of education and conclude that the groups were no different, once education was controlled. Trying to control for education creates a counterfactual situation in which high school dropouts and college graduates have the same level of education.
I think we have to conclude that there is very biased reporting in this abstract. There is not a basis for claiming the intervention improved either in cancer related problems or fear of progression. There should have been more candor about the trouble recruiting patients, unexpected differences after patients had supposedly been randomized, and other differences due to selective retention. The concluding statement of the abstract is clearly unwarranted: the study does not produce evidence that short-term changes in functioning may be improved by enhancing couples dyadic skills
It's somewhat difficult to construct an accurate picture of how much couples were interested in this intervention, but it appears that few couples were interested and most who were interested already well functioning. At the beginning of the methods is stated that the investigators received 298 addresses of interest couples from participating hospitals and that 58% were not interested in the client study participation with the primary reason being the couples viewed themselves as functioning well and not needing assistance. This section concludes with a figure of 90 or 34% of eligible couples who were contacted being enrolled in the study, but only 72 attending even one session. It's only in the discussion section that we had told that recruitment rates with 13% and that most couples participating already had a good relationships and communication skills.
There are other problems in making sense of this trial. Despite having administered a full battery of outcome assessments, investigators really cannot make a good case that they are dealing with clinically significant levels of problems. The authors did not rely on well-validated measures that allow estimates of the extent to which the numbers reported represent clinically significant problems or change. It has not even been determined how one measure in particular, posttraumatic growth, is related to good adaptation as assessed by other measures. Cancer patients’ claims to have grown from the experience have not shown to be accurate—they are most likely due to distorted recollections of what they were like before cancer—and reports of such growth are not necessarily related to better adjustment.
Who is to blame here for a misleading abstract? Inaccuracies in published abstracts reflect not only on the authors, but on the journals and reviewers. I think a case could be made from this abstract to the editor and reviewers were asleep at the wheel and failed to catch some things that were there to be found, even if not necessarily obvious.
Overall, the actual report of the study, unlike the abstract, suggests the intervention is hardly ready to be packaged and disseminated into routine care. If we accept that cancer related problems or fear of progression were actually the primary outcomes of the study, they may have been bad choices because there is a natural tendency of these issues to improve without intervention. At the risk of overgeneralizing, the study suggests there may be considerable difficulties recruiting for couple oriented interventions at the end of cancer care or immediately afterwards. The problem could be with couples not being able to accommodate the burden of additional visits after the woman has just finished cancer treatment or it may be that many couples remain unconvinced that cancer is a couples issue they are handling badly. Regardless, researchers considering couples intervention would do well to do some basic patient-oriented pilot work in which they determine levels of interest, what kind of couples are interested, and what intensity of intervention couples are willing and able to accommodate in their lives.