Skip to main content

Verified by Psychology Today


Why Psychotherapy Efficacy Studies Are Nearly Impossible

Scientifically-valid therapy outcome studies are devilishly difficult to design

The purveyors of Cognitive-behavioral psychotherapy (CBT), one of the large number of “schools” of thought in the fields of psychology and psychiatry, like to tout their randomized controlled outcome studies (RCT’s) as proof that theirs is the most “evidenced based” type of psychotherapy.

Unfortunately, RCT’s of psychotherapy are a lot different than, say, drug studies, because there are a nearly infinite number of factors which help to decide whether a course of a given type of psychotherapy will lead to a positive outcome, and there is simply no way to control for them all. We cannot even agree what a “successful” result should be. Symptom relief? Personality change? Improved relationships? Better ability to love and work? Personal growth and fulfillment? All of the above?

John F. Clarkin is a highly respect psychotherapy researcher who has perhaps the most experience of anyone in the field. He recently published an article in the Journal of Personality Disorders (Vol, 26 (1), Feb. 2012, pp. 43-62) entitled, “An Integrated Approach to Psychotherapy Techniques for Patients with Personality Disorder.” In it, he makes what I consider several extremely important and crucial points in the debate about the various treatment ideologies.

John F. Clarkin, Ph.D

First, he points out, the empirically "validated" models often focus only on symptoms and not on the more important and enduring aspects of personality. In fact, in longitudinal studies of affected individuals, the personality disorder criteria and symptoms change over time, often all by themselves, while their interpersonal dysfunction does not change very much at all.

This implies that that, while symptom reduction is important, it is the interpersonal issues that should be the major long term focus in therapy. The heart of the matter in personality disorders is the patient’s conception of self and others. The ultimate goal of treatment should be interpersonal functioning that allows for pleasure, interdependence, and intimacy in relationships.

Second, the literature on outcome studies is based on average scores on symptom-based outcome measures. This covers up the obvious fact that in any treatment, some patients change and some do not. This is further complicated by the issues of “comorbidity.” Patients with borderline personality disorder (BPD), for instance, often meet criteria for one or more additional personality disorders, not to mention additional psychiatric disorders. And even within the definition of a single personality disorder, many different combinations of traits are possible to arrive at the diagnosis. Much more so than in any other field of medicine, each patient with a personality disorder is highly unique. Therefore, no one treatment can or will work for a majority of patients.

Third, to ensure that all therapists in a study are doing roughly the same thing, the studies have to employ instruction manuals called treatment manuals, and measure whether or not a therapist in a study is doing what he or she is supposed to be doing. However, as Clarkin states, “A close examination of the treatment manuals…suggests that each manual contains some strategies that are unique and essential to the treatment, and some that are common (sometimes with different jargon) with other approaches."

All of these therapies consist of multiple interventions, and the studies do not show which ones are important and which ones are not, or even more importantly, which ones may even be counterproductive: “…most probably contain low doses of effective practices, ancillary but important aspects that make delivery of the treatment more palatable, superstitious behaviors (those we think that matter but do not), and factors that impede or fail to optimize therapeutic change.”

A fourth important point he makes is that all of these therapies consist of multiple interventions, and the studies do not show which ones are important and which ones are not, or even more importantly, which ones may even be counterproductive: “…most probably contain low doses of effective practices, ancillary but important aspects that make delivery of the treatment more palatable, superstitious behaviors (those we think that matter but do not), and factors that impede or fail to optimize therapeutic change.”

A fifth point he makes that I would like to mention is that it is the delivery of the techniques is often more important than the techniques themselves. Techniques can be done skillfully, “…or in an abrasive, authoritarian, or uninterested aloof way. There is plenty of research data that suggests that the skill of the therapist can be, in many instances, far more important to good results that an individual techniques." Clarkin adds, “The therapist is not a technique-dispensing machine. Many of the techniques are applied common sense, and could be read out of a book."

Last, let us not forget that the receptivity of the patient is another major factor in whether or not therapy is successful. If patient factors are not taken into account, the effectiveness of any technique “approaches zero.” Furthermore, despite the rejection of the concept of transference by CBT therapists, “Some patients with severe needs for attachment with no relationships outsider of treatment may become intensely attached to and preoccupied with the therapist in ways that are detrimental to growth.”

In short, it makes a lot more sense to integrate the various techniques across treatment strategies from the treatment manuals in a way that tailors them to the particular patient in front of the therapist. Throughout treatment, individual decisions must be made, which takes a skillfull therapist indeed.

Psychotherapy outcome research can never be the only standard by which the "science" of human behavior-change technology should be measured. In fact, it's not even the gold standard.

In order to better understand ourselves and what leads us to change our behavior, we must use ALL available sources of information. We have to look at the widespread clinical experience of psychotherapists who use a variety of techniques and theories with a variety of clinical populations. We have to look for potential biases in both clinical studies and within an individual therapist's anecdotes and the conclusions that we draw from them. In forming conclusions about both anecdotal and controlled-trials data, we have to look at a wide variety of possible explanations, as well as for any information or experiences that would seem to contradict those explanations.

We have to look at historical and sociological trends. We have to look at new knowledge from the neurosciences that might account for findings that are difficult to explain or reconcile with other beliefs. We must look at evolutionary biology. We must examine our own beliefs for logical inconsistencies. We have to be honest about thinking about ourselves and what makes us tick personally.

Attention must be paid to the following additional issues, as described in my book, How Dysfunctional Families Spur Mental Disorders.

1. The Problem of the “False Self”

People do not act the same way in all social contexts. They do not act or speak the same way around a boss that they do when they are alone with a lover. A man’s behavior in a strip club is very different than his behavior when he is playing with his children. We have different “faces” or masks which we apply to ourselves in different environments. Not infrequently, these masks are meant to manipulate others to get them to do what we want them to do. Some of the masks are so rigid and pervasive that they become what therapists call a "false self." I have described several of these in previous posts.

Additionally, I never cease to be amazed at how mental health professionals and researchers seem to believe that they really know what is going on in a patient’s or a research subject’s life based solely on the self report of the patient, or solely on the reports of the patient’s intimates, or even on the reports of people like teachers who observe the behavior of children in only one context that includes thirty other distracting students. If these professionals were asked if they believe that people often act differently in public than they do behind closed doors, they would of course say yes, but they seem to develop amnesia for this fact in discussions and in studies.

A patient’s family members may be just as motivated to give a distorted view of a patient as is a patient. Parents, for example, may prefer to believe that their child has some sort of mental defect, so as not to experience as much of their own covert guilt about their parenting skills. Conversely, some may actually prefer to blame the child’s behavior completely on themselves, in order to let their “perfect” child off the hook. Most mental health practitioners do not make home visits to watch patients and family members interact in their natural environment. Even if they did, unless they had a camera operating twenty four hours a day as in the movie The Truman Show, they could still be easily deceived.

2. Double Blinding

When it comes to psychotherapy treatment outcome studies, we cannot do double blind placebo-controlled comparisons of two different types of psychotherapy treatments. This is true because, in a sense, the therapist – or more correctly the relationship between the patient and the therapist – is the treatment. If the study were to meet the criteria for being double blind, that would mean that the therapists who administer the treatment would have to not know what they were doing.

Of course, they cannot administer psychotherapy without being aware of what techniques they are using. If they could, that would mean that they were incompetent. Not a fair test of a treatment! The fact that the therapy relationship is one of the basic aspects of the treatment also makes placebo or “sham” treatments difficult, because any relationship has some effect on an individual.

Does this mean we should give up on evaluating psychotherapy treatment scientifically and rely exclusively on clinical anecdotes? Of course not. Studies are still important. We just have to understand their limitations.

3. The Allegiance Effect

A large number of different schools of psychotherapy have different approaches to the understanding of and methodology for changing a patient’s repetitive dysfunctional habits. Most of these therapy schools were designed by charismatic and creative individuals who based their ideas on clinical anecdotes. These innovators are highly invested emotionally in their own personal theories, and want them to look good in comparative psychotherapy outcome studies.

This leads to a so-called allegiance effect in RCT’s. The preferred psychotherapy school of the researcher is likely to be delivered more enthusiastically and with more rigor to subjects in the study than is the competing therapy treatment. One survey study examined 29 RCT outcome studies that compared one type of therapy to another and found a correlation of .85 between researchers’ therapy allegiance and outcome. That is, the researcher’s preferred treatment came out ahead 85% of the time. Just as in sponsored drug studies, this number is too way high to discount the presence of a significant bias in the studies.

When differences are found between two therapies, they are often statistically but not clinically significant. They show that one therapy is slightly more advantageous than the other, but that the actual improvement of the subjects was so minimal as to be inconsequential.

When both groups of therapists who are providing the treatment in the study are equally committed to the paradigms they are delivering, comparative psychotherapy outcome studies almost always result in a tie. In psychotherapy research circles, it is known as the “dodo bird verdict.” This refers to a character from Alice in Wonderland, the dodo bird, who in one passage said “Everybody has won, and all must have prizes.”

4. Treatment as Usual

Lately, many researchers engaged in psychotherapy RCT’s have employed a control group called treatment as usual (TAU), which really stacks the deck in favor of their pet psychotherapy school. Lining up practitioners of a different school from the researcher’s to act as therapists for a comparison group is often difficult. Additionally, if the researcher were able to do so, the study might not show his or her therapy to be superior to another type. For these reasons, the use of TAU control groups has become almost epidemic in psychotherapy RCT’s.

Subjects randomly assigned to the TAU condition, which serves as a comparison group to the group of subjects receiving the researcher’s therapy model, are simply released back into the community to get whatever other treatments are already out there. Some may see practitioners from other therapy schools, some may get medications, some may get both, and many others may get neither. Both the TAU group and the experimental group are followed up at equal time intervals and given all the same outcome measures. The psychotherapy methodology that serves as the investigated treatment always seems to beat TAU.

The reader should understand that within any widely-practiced therapy model, both good therapists and bad therapists can be found, just as practicing physicians can be either good or bad psychopharmacologists. For the subjects in the TAU condition who receive treatment, the average results of the good clinicians in the community are probably cancelled out by those of the bad ones. TAU subjects may also be seen less frequently. Some, as I mentioned, are getting no treatment at all.

Meanwhile, the experimenter’s group is usually getting a lot more individualized attention by therapists who are highly committed to the particular treatment model. The experimenter’s therapy is provided by uniformly well-trained and enthusiastic therapists under very well controlled conditions. The psychotherapy that is provided is applied with rigor and consistency, and is scrutinized by other observers through the use of videotapes of the sessions. Therapists who make errors are supervised almost immediately. On top of this, research therapists often have caseloads that are very much smaller than those of folks out in practice, allowing them to spend more time deciding how to approach the clinical issues they face.

These factors may be the real reasons their patients do better than those receiving TAU. I cannot recall a single instance of a study in which TAU beat another therapy delivered in such a manner. If it did, I would have to wonder how the experimenter could have possibly accomplished such an unlikely feat.

5. Funding Issues

Another big issue for the field concerns which psychotherapy treatment outcome research studies get funded. If a scientist cannot get money for a project, it will rarely get off the ground, because these types of studies are very expensive to mount. Most successful psychotherapy RCT’s have employed CBT because of the predominance of this model in professional psychology training programs, and because most CBT treatments aim primarily for symptom reduction, which is relatively easy to measure, rather than personality change, which is not.

6. Problems with Generalizability of Study Results

Another big problem with all psychotherapy RCT’s is that most studies require that the subject population be homogeneous, meaning that the subjects in the study must be very similar in the nature of the disorder they exhibit, and in how severe it is. This requirement means that patients who have more than one DSM disorder or psychological problem are often excluded from studies. In contrast, most patients seen in practice, at least by psychiatrists, have more than one disorder(co-morbidity). This fact alone limits what is termed thegeneralizability of the study’s finding. We do not know from a study using patients who have only one disorder if the treatment employed in the study would work as well with patients who have multiple problems.

Because studies need subjects who will stick with the treatment until the end of the research project, some subjects who have certain characteristics that are common in clinical practice tend to be excluded. This problem further limits the generalizability of the study. For instance, most studies of treatments for depression exclude patients who are suicidal!

Another problem with RCT’s is that many subjects drop out of a study or are removed from a study as it proceeds because they do not completely cooperate with the treatment in some way. CBT studies, as do many others, tend to have a fairly high dropout rate. The subjects who end up completing the study are usually the most motivated to change, and would therefore be expected to do better than those who lack this motivation. This all makes the treatment method look much better than it would be if it were employed in a typical clinical practice setting.

7. Therapist Flexibility

Unfortunately, a good therapist has to be flexible and employ a variety of different strategies in ways that are tailored to the proclivities and sensitivities of the patient in front of them. Generic interventions may not only fail to work, they may backfire and make matters worse. Verbal interventions must often be phrased differently to different patients in order to reduce the patient’s defensiveness.

Treatment and adherence manuals take away a lot of this flexibility, so that the therapy as performed in an RCT is not always similar to the way that clinicians out in the field practice it. Creating treatment manuals can itself be a daunting task. I heard a respected researcher tell a group of other researchers that his team was having trouble designing such a manual because the founder of the treatment they wanted to study was observed to perform psychotherapy completely differently than his own wife, who supposedly was a practitioner of the same model of therapy.

More from David M. Allen M.D.
More from Psychology Today