James C. Coyne Ph.D.

The Skeptical Sleuth

Faux Evidence-Based Behavioral Medicine Part 2

Serving professional interests, ignoring needs of cancer patients in pain

Posted Mar 27, 2012

In my last blog I discussed how professional organizations create appearances that services that their members offer are evidence-based and then take steps to marginalize and silence critics and skeptics. I began discussing the example of a Society of Behavioral Medicine (SBM) meta-analysis concerning interventions for cancer pain. In this blog, I continue showing how that meta-analysis was flawed and misleading and why it should be considered as an advocacy piece rather than a balanced consideration of best evidence.

I was informed that I had upset the leadership of SBM by labeling the meta-analysis as "commissioned by SBM," earning myself a telephone call from the President-elect. So, I will stick with the first author's designation of it as being "sponsored" by SBM, which occurs in an e-mail that is otherwise fascinating in what it reveals about the politics of generating attention for this meta analysis at the SBM convention. You figure the difference between "commissioned" and "sponsored."

I will show that the write up of the meta-analysis lacked the transparency needed for an independent evaluation; the authors relied too heavily on poor quality, small studies; made key mistakes in selecting studies, and lumping and splitting interventions into categories; and came to faulty and premature conclusions. But more basically and more disappointingly, the authors failed to situate the meta analysis in the larger context of exceedingly poor control of pain among cancer patients in America, even when medications of proven efficacy are readily available.

Ain't Necessarily So

Sportin' Life

In the Gershwins' opera Porgy and Bess, the character Sportin' Life sings It Ain't Necessarily So to express his doubts about some statements in the Bible. We borrowed the title of the song for an article discussing serious limitations in the first four Evidence-Based Treatment Reviews that had been sponsored by the Society of Behavioral Medicine's Evidence-Based Behavioral Medicine Committee.

Our article was invited by editor Bob Kaplan, but was sandbagged in review. One angry reviewer condemned the manuscript with

"[The authors'] frustration with the work of others is not enough to appoint themselves as the Supreme Judges of the work of others - however flawed this work might be."

Bob Kaplan saw what was going on and saved our manuscript from the reviewers, but then the authors whom we criticized requested time to respond. Fair enough, but no response was forthcoming and they simply delayed publication of the paper.

Serious problems in the field were reflected in any of the four SBM meta-analyses having even made it through peer review. We viewed our article as a wake-up call, and we tried to provide guidelines for relatively quick evaluations of meta-analyses. However, in the case of the four meta-analyses, anyone using these guidelines to look for flaws would be thwarted by a lack of transparency and outright inaccuracies in what was presented in the meta-analysis articles. Application of the criteria required going back to the original studies, no small task and beyond the possibilities of many consumers who lack the time, knowledge, or access to a university library connections where the articles can be obtained without cost.

The curse of small, poorly designed studies

One of our chief concerns was that meta-analyses in behavioral medicine relied on studies that were too small and that had similar problems in quality, which were compounded rather than eliminated, when their results were combined to get a single summary effect size in a meta-analysis.

Small studies suffer strong publication bias, in which failures to obtain positive findings go unpublished because the studies are too small, and positive findings get widely cited based on the perception that they are all the more exciting because of the smallness of the sample. Small studies require a larger effect size to attain statistical significance, and so published results tend to be exaggerated and not to be replicated in later studies that are larger and better quality. Small studies are particularly vulnerable to loss of even a few patients to follow-up and to investigators knowing to which condition patients are assigned. Notoriously, with small studies, investigators can naïvely or deliberately monitor incoming data and stop the trial when a positive finding has been obtained, even when it is a chance finding that would be undone with continued accumulation of patients.

In Ain't Necessarily So, we presented detailed arguments for a requirement of studies having at least 35 patients in the smaller of the intervention or control groups. We actually showed that at least 50 patients was better justified, but that holding studies to that standard would have left all of the four SBM meta-analyses with too few studies to proceed. Interested readers can consult our article and the references it cites, as well as an the slide presentation that I have made available, and we will be applying 35 patient criterion in this blog.

What the SBM authors claimed about psychosocial interventions for cancer pain

The authors boasted of "robust findings" of "substantial rigor" in a meta-analysis that provided "strong evidence for psychosocial pain management approaches." They claimed their findings supported the "systematic implementation" of these techniques. They estimated that it would take 812 unpublished studies lurking in file drawers to change their assessment. Intimidating perhaps, but most experts on meta-analysis now dismiss out of hand this so-called "file drawer" or "failsafe N" statistic as unreliable and often grossly exaggerated.

The authors made their case for widespread dissemination and implementation of psychosocial techniques for pain management with statements such as "Of the studies included for review, larger effect sizes were associated with more rigorous designs that included monitoring of treatment implementation according the study protocol." The suggestion was that there was advanced clinical trial science in the studies they reviewed and the better the science, the stronger the findings.

The average effect size for pain severity reported was 0.34, which was described as moderate.This is an arbitrary designation that was left unexplained.  My recent blogs have discussed the claims by supporters of psychotherapy that a similar effect size is "trivial" and "clinically insignificant" when obtained for antidepressants. I disagree, but I raise this point because many who will embrace the claims of the SBM authors would dismiss such an effect size if it were claimed for medication.

The authors acknowledged problems in the studies that they reviewed, such as that only 20 percent of the studies concealed from the investigators which treatment patients were being provided, and that many of the pain measures used as outcomes were crude, unvalidated, and that raters were rarely blinded to patient group assignment. But whatever problems they found apparently did not deter them from concluding the weight of evidence suggested moving forward with implementation, rather than waiting for more and better quality research for a definitive answer.

Such prematurely enthusiastic reviews of the literature can have the effect of discouraging more funds for research because the keys are presented as already having been settled.

What I found in the details of the meta-analysis

Authors need to provide sufficient transparency so that readers—other researchers, policymakers, clinicians, and savvy consumers—can decide for themselves whether the evidence warrants their conclusions.

At first appearance, these authors seemed to have succeeded admirably. Table 2 summarized the characteristics of interventions. Figure 2 displayed a forest plot of effect sizes with largest positive effects sizes at the top and negative effect sizes at the bottom. Tables A1 and A2 in the Appendix provided descriptions of the individual studies.

However, the important question of whether the meta-analysis depended heavily on small studies of poor quality took some sleuthing to answer.

Inexplicably, when I looked to see how many patients there were in intervention and control groups, I frequently encountered a "NR," indicating that the number of patients in the intervention and control groups was not recorded. I had to go to the original articles and quickly saw that the problem was in the meta-analysis, not the lack of information in the original articles. I cannot understand how none of the nine authors of the meta-analysis failed to notice this glaring omission before the article came out, and it should leave readers wondering what other problems are lurking in the meta-analysis.

The classification of intervention (skills training or education) and control groups (usual care, component, or equivalent treatment) was too broad and vague in the table. To fully evaluate this issue, a reader would have to go to the original trials entered into the meta-analysis.

Of the 18 conditions categorized as skills training, six had only one or two sessions of relaxation or hypnosis, others had 3 to 12 sessions of cognitive behavioral therapy, and one had 52 sessions of training in self-hypnosis. It is not clear how a single session without opportunities for subsequent practice and feedback constitutes skills training or how such interventions so vastly differences in intensity and content can be lumped together as comparable for the purposes of meta-analysis.

If categories for classifying studies are too broad or vague, summary effect sizes may not apply to particular types of interventions or to any intervention included in the meta-analysis. Think of it: in this meta-analysis, results from a relatively well-designed trial evaluating hypnosis was combined with results of a small, poorly conducted trial of soothing music. The music intervention had one of the largest effect sizes in the meta-analysis occurred, but in the wrong direction. If the results of this trial are to be believed, patients getting soothing music experienced much more pain than those who did not. A summary effect size from these two studies would not characterize either of the interventions.

Ratings of study quality are quite important, given the known limitations of the literature. There are a number of systems for grading the quality of evidence, including notably a widely accepted simple tool from the Cochrane Collaboration. Unfortunately, the authors of the meta-analysis made unspecified "adaptations" to another scale for rating study quality and reported them in a table. They claim that their ratings were designed to "identify studies that are generalizable, internally valid, and contain interpretable data." But their criteria for rating quality appear to be liberal, for instance, not explicitly identifying failures to provide intent to treat analyses.

Nonetheless, the authors' modified criteria indicated that in 11 out of 38 studies, groups were not similar at baseline regarding important prognostic indicators; in 16 out of 38 studies, measures of outcome were not obtained from more than 85 percent of the patients at baseline; 10 of the studies did not provide basic descriptive data for at least one key outcome; most (25 out of 38) did not use a treatment manual; most (31 out of 38) did not monitor treatment implementation; and 11 studies did not provide information concerning lost to follow-up.

Overall, relying on what the authors reported is not sufficient to esimate the extent of their reliance on smaller, poor quality studies, but strong suspicion that that is the case that will have to be confirmed by going to the original studies.

What I found going back to the actual studies

The meta-analysis depended heavily on small trials. Of the 38 trials, 19 studies had less than 35 patients in the intervention or control group and so would be excluded with application of this criterion. As noted below, two of the other largest trials should have been excluded for other reasons.

Of the only 13 studies individually having significant effects on pain severity (Confidence interval excluding 0.0 in the forest plot in Figure 2), eight would have been excluded because they were too small and one because it should not have been included in the first place.

For the four studies with the largest effect sizes, one had only 20 patients receiving relaxation; the next largest had only 10 patients who were hypnotized; the next, 20 patients listening to the relaxation tape and 20 patients getting live instructions, but these numbers were obtained by replacing patients who dropped out; and the study with the fourth largest effect size had 15 patients receiving training in self hypnosis.

By far the largest study included in the meta-analysis, which also had one of the highest effect sizes, compared telephone monitoring of treatment for depression and pain. Patients receiving the intervention had more access to medication than patients in the control group who often remained unmedicated. There was also CBT available to some patients in the intervention group, but its contribution could not be isolated. At best, this is a study of a multi-component, complex intervention in which the effects of this specific psychosocial component cannot be distinguished. It therefore should not have been included.

Another study with one of the largest number of patients in the intervention group had no control group and so was not a randomized trial. No other trial that was included in the meta-analysis appeared to be nonrandomized, and so I concluded that its inclusion in the meta analysis of randomized trials was mistake.

Some of the smaller trials were quite small. One had 7 patients receiving an education intervention;  another had 10 patients getting hypnosis; another, 15 patients getting education; another, 15 patients getting self hypnosis; and still another, 8 patients getting relaxation and eight patients getting CBT plus relaxation.

No trials excluded because of poor quality

Many of the studies did not provide sufficient details to determine whether all patients were included in analyses, how randomization was conducted, or none allowed assessment of how many outcome measures were assessed but not reported because results were not significant.

Apparently in their systematic review of the literature, the authors encountered no studies that had to be excluded because of being too small or of too low quality or both.

 Other problems in interpreting the meta-analysis

There was rarely any indication in the studies as to how concurrent medication was entered into consideration, or whether patients assigned to a psychosocial intervention might get more medication because of more professional attention to their pain and therefore a greater likelihood of getting medication or having their pain control with medication monitored. An author of at least one of the studies has gone on record indicating that professionals have a responsibility to refer patients for medication who are in pain, irrespective of whether it is part of the intervention, but that seems a conflict with the requirements of a randomized controlled trial.

It appears that listening to music, being hypnotized during a medical procedure, and being taught self hypnosis over 52 sessions, are all under the rubric of skills training. Similarly, interactive educational sessions are considered equivalent to passing out informational materials and simply pamphleteering.

For the purposes of obtaining outcome data for the meta-analysis, information collected two hours after a single session of hypnosis for procedural pain was considered equivalent to data obtained after a year of self-hypnosis training for chronic pain. Overall, some highly different interventions, outcomes, and outcome assessment points were thrown into the meta-analysis blender to yield the estimated effect size of 0.34.

Perhaps most importantly from a cancer pain control perspective, there was no distinguishing of whether the cancer pain was procedural, acute, or chronic. These types of pain take very different management strategies. In preparation for surgery or radiation treatment, it might be appropriate to relax or hypnotize the patient or provide soothing music. The efficacy could be examined in a randomized trial. But the management of acute pain is quite different and best achieved with medication. Here is where the key gap exists between the known efficacy of medication and the poor control in the community, due to professional and particularly patient attitudes. Control of chronic pain, months after any painful procedures, is a whole different matter, and based on studies of noncancer pain, I would guess that here is another place for psychosocial intervention, but that should be established in randomized trials.

Almost no attention was given to differences in control groups. This is ironic because two of the authors of the meta-analysis have provided an excellent dissection of the meaninglessness of a routine care control group condition, when there is no effort to describe the specific nature of the routine care or ascertain whether is even adequate care. In the context of routine cancer pain control in the community, we can expect that the adequate control with medication is simply not occurring at least half the time or more, and provision of any attention and support that might be an improvement. An intervention having a larger effect size than inadequate care may say nothing about the efficacy of that particular intervention.

The publication record of only one of the nine authors of the meta-analysis suggests any research of cancer pain, and perhaps that is why conduct and interpretation of this meta-analysis is so silent with respect to the inadequacies of routine cancer care control in the United States.

What do we conclude if we are committed to remaining evidence-based?

The lack of evidence is not evidence of a lack of efficacy, but we need more research.

The lack of availability of adequately powered, high-quality studies precludes calculating a meaningful effect size of the kind that would be needed for justifying wholesale dissemination and implementation. Perhaps a few high-quality studies of specific interventions are suitable for combining and calculating effect sizes for these specific interventions, but caution be should be exercised in generalizing beyond these interventions.

Think of it: a meta-analysis of whether medication was effective for cancer pain would not be published if it did not qualify its conclusions with what medication for what type of cancer pain under what circumstances. It would not end with a sweeping judgment that left the impression to consumers that any medication at any stage of cancer pain (aspirin for pain after radical mastectomy?) was empirically justified.

We should hold proponents of the widespread dissemination and implementation of psychosocial interventions for cancer pain to the same standards.

The most valid conclusion is that more and better research is needed to define a specific role for psychosocial interventions to improve cancer pain control. Calls for widespread dissemination and implementation are premature.

Systematic reviews and meta-analyses that are aimed at policy makers, clinicians, and patients and their families need to be contextualized with what is known about the large gap between the evidence for the efficacy of cancer pain control with medication, and the inconsistent but generally poor quality with which pain is controlled in routine care.

 "Nothing would have a more immediate effect on quality of life and relief from suffering, not only for the cancer patients but also for their families, than implementing the knowledge accumulated in the field of palliative care" Cancer and Palliative Care Unit of the World Health Organization

If we put aside professional allegiances and narrow guild interests, we might just find that the best application of behavioral medicine is to ensure better clinician and patient support and education with respect to cancer pain medications, particularly during acute pain. Certainly that focus deserves more attention that it has thus far.

What else could the authors have concluded about psychosocial interventions for cancer pain?

I am confident that there is considerable potential for a role for psychologists and other behavioral medicine specialists in education as well as skills training. But we are more likely to get the research to demonstrate that if we do not prematurely claim that convincing evidence is already there.

The authors of this meta-analysis seemed intent on being able to justify widespread dissemination and implementation of psychosocial interventions for pain. Lacking sufficient evidence to do so, what could they have done?

  • Acknowledge the lack of high-quality studies. Perhaps some tentative generalizations could be made from research concerning interventions for non-cancer chronic pain. The recommendation would be that more research extending this work to cancer pain be given a priority over widespread dissemination and implementation based on weak studies of cancer pain.
  • Concede that the evidence was not yet there, but nonetheless declare a consensus that efforts should be mounted for widespread dissemination and implementation. The senior author on the meta-analysis modeled that in recommending psychosocial interventions for cancer more generally. Hardly satisfying, but it is at least honest, and leaves intact the commitment to arriving at the best evidence, rather than subverting the process of determining best evidence to serve guild interests.

Stay tuned for an upcoming blog: Is Pharma Hijacking Screening Cancer Patients for Distress?

More Posts