Skip to main content

Verified by Psychology Today

The DSM-5 Field Trials' Decidedly Mixed Results

Far from being a ringing endorsement, the field trials set off fresh alarm bells

“What’s the chance that a second, equally expert diagnosis will agree with the first, making a particular diagnosis reliable?” asks David Kupfer, chair of the DSM-5 task force, of the decidedly mixed results of the DSM-5 field trials. First off, are you sure you really want to know?

Kupfer, who has taken to writing for the Healthy Living section of the Huffington Post to promote and defend the manual, adds helpfully: “A reliability of 1 means that the two diagnoses will always agree; a reliability of 0 means that the second is no more likely to agree than it is to disagree.”

Keep that in mind when you hear that nine out of 23 adult or child diagnoses, generated under real-world conditions, indicated “questionable” to “unacceptable” diagnostic reliability. The criteria, according to Psychiatric News, were tested from October 2010 til February 2012 by 279 clinicians at 11 academic centers in the United States and Canada.

In its press release on the results, the American Psychiatric Association was anxious to stress, “14 of the 23 adult or child psychiatric diagnoses had ‘very good’ or ‘good’ reliability. Among these were autism spectrum disorder and ADHD in children and posttraumatic stress disorder and binge-eating disorder in adults.”

That’s certainly notable, even if binge eating was defined as “discrete episodes in which the individual uncontrollably eats a larger amount than most people would in a similar time and under similar circumstances” (my emphasis). "Most people"? "Circumstances similar" to what, exactly? But consider the disorders, proposed and existing, whose unreliability stood out so blatantly that even Kupfer, sounding generally pleased, was forced to concede: “Regardless of why, we acknowledge that the relatively low reliability of major depressive disorder and generalized anxiety disorder is a concern for clinical decision-making(my emphasis).

No kidding. When Kupfer states that “two of the most commonly diagnosed conditions” were “unsuccessful in meeting the standards set for DSM-5,” not only is there legitimate cause for concern; there's also ample justification for asking how and why these unreliable conditions were added to earlier editions of the manual in the first place.

One of the strangest discoveries to come to light from the latest trials, at least for me, is that while the DSM-5 ones were designed to emulate real-world conditions, equivalent trials for DSM-IV were not. As Kupfer notes, “As part of that process two decades ago, patients were carefully screened.” In fact, he, Darrel Regier (vice-chair of the DSM-5 task force) and several other researchers also doubling-up as task force members add, quite matter-of-factly, “the DSM-IV field trials enrolled carefully selected patients likely to have the target disorder” (my emphasis). The patients weren’t in short statistically representative, but rather a highly selective, prescreened sample that inevitably would yield higher-than-average results.

The careful preselection, in DSM-IV trials, of patients with “the target disorder” is mentioned in the latest studies as if to inspire confidence in the trials that have just concluded. But when we learn, again from Kupfer, that major depressive disorder and generalized anxiety disorder were meant to “serve as reference disorders from the DSM-IV trials,” presumably because of their reliability, it’s all-the-more disconcerting to hear the task force members, vice chair, and chair collectively acknowledge: “evidence from the literature indicates that the current diagnostic criteria for a number of mental disorders are unclear.”

How could they not be, given their combined vagueness and expansiveness? When I interviewed Robert Spitzer about the creation of generalized anxiety disorder, he told me: “We came up with that name [GAD] after we had anxiety neurosis in DSM-II, and if you had panic then there had to be something that was left over. So that became Generalized Anxiety Disorder” (qtd. in Lane, Shyness 76). In his history of psychiatry The Antidepressant Era, David Healy underscored the basic truth of that account, though he put it less charitably: “Floundering somewhat, members of the anxiety disorders subcommittee stumbled on the notion of generalized anxiety disorder (GAD), and consigned the greater part of the rest of the anxiety disorders to this category” (193).

Still, even as Kupfer acknowledged his and the task force’s “concern” about “the relatively low reliability of major depressive disorder and generalized anxiety disorder” (“relatively low” being a euphemism, presumably, for the terms “questionable” and “unacceptable” that appear in the actual published study), he tried immediately to quash concern and especially criticism:

Some DSM-5 detractors have spotlighted the six [unreliable diagnoses] as indicative of flaws in the field trials, especially because this group included major depressive disorder and generalized anxiety disorder, two of the most commonly diagnosed conditions. The opposite is closer to the truth. Rather than discrediting the field trials, the outcome here reveals the critical value of how the trials were constructed and conducted and how we are moving forward.

True, the outcome does reveal “the critical value of how the trials were constructed,” especially in bringing to light the fundamental unreliability of “two of the most commonly diagnosed conditions” in American psychiatry. But, again, it’s hard to view that as a net plus when one considers the millions of people who’ve been given diagnoses that are technically so unreliable one cannot—indeed, should not—presume two psychiatrists could identify or distinguish them. (To offer further reason for concern: in the year 2000 alone, between 3,000 and 5,000 North Americans began a new course of drug treatment for generalized anxiety and/or social anxiety disorder every day.)

“Strategies need to be developed to address the problem” of the disorders’ unreliability “as the manual evolves into a living document,” concludes Kupfer vaguely. But the manual is already “a living document,” if one considers how many times it is invoked daily around the world—not just in American schools, courts, prisons, insurance offices, and of course patient rooms. And it’s not even clear from the recently published results that the DSM-5 task force will consider such “rigorous, empirically sound evaluations” binding in its decision-making.

On the contrary, as the researchers to one of the studies put it (again including Regier and two other task force members), “The results of the field trials were intended to inform the DSM-5 decision-making process, but in and of themselves would not determine the inclusion or exclusion of diagnoses in the final version of DSM-5.”

I hope you find that reassuring. Follow me on Twitter: @christophlane


Clarke, Diana E., William E. Narrow, Darrel A. Regier, et al. “DSM-5 Field Trials in the United States and Canada, Part I: Study Design, Sampling Strategy, Implementation, and Analytic Approaches.” Am J Psychiatry 2012; 10.1176/appi.ajp.2012.12070998

DSM-5 Field Trials Posted Online by AJP.” Psychiatric News Alert: The Voice of the American Psychiatric Association and the Psychiatric Community (October 30, 2012).

Healy, David. The Antidepressant Era. Cambridge, Mass.: Harvard University Press, 1997.

Kupfer, David J. “Field Trial Results Guide DSM Recommendations.” Huffington Post (November 7, 2012).

Lane, Christopher. Shyness: How Normal Behavior Became a Sickness. New Haven: Yale University Press, 2007.

Narrow, William E., Diana E. Clarke, David J. Kupfer, Darrel A. Regier, et al. “DSM-5 Field Trials in the United States and Canada, Part III: Development and Reliability Testing of a Cross-Cutting Symptom Assessment for DSM-5.Am J Psychiatry 2012; 10.1176/appi.ajp.2012.12071000

Regier, Darrel A., William E. Narrow, David J. Kupfer, et al. “DSM-5 Field Trials in the United States and Canada, Part II: Test-Retest Reliability of Selected Categorical Diagnoses.” Am J Psychiatry 2012; 10.1176/appi.ajp.2012.12070999

More from Christopher Lane Ph.D.
More from Psychology Today
More from Christopher Lane Ph.D.
More from Psychology Today