Psychiatry

Being Bayesian in Insane Places

A brief discussion of the mathematics of Rosenhan experiments and follow-ups

Posted July 12, 2017

I am prompted to write this vignette after viewing an entertaining and faithful adaptation of Lauren Slater’s non-fiction 2004 book, “Opening Skinner’s box”. She described with some detail her attempt to replicate David Rosenhan’s 1973 experiment that showed a lack of consistency across diagnosticians. Briefly, Rosenhan sent 8 fake patients to psychiatric hospitals complaining of atypical symptoms and all were promptly labeled schizophrenic and admitted. This study caused quite a bit of controversy at the time and in some ways cemented the centrality of the Diagnostic and Statistical Manual (DSM) in clinical diagnosis. Slater later on attempted to replicate the experiment and argued that the tendency to quickly label mental illness has not subsided, though perhaps the tendency to admit patients has. Her experience and book also generated some controversy, which was described in an American Psychology Association publication more than 10 years ago.

The main critique of Rosenhan’s experiment is methodological. Rosenhan assumes that diagnosis is a binary classification event where if one detects 0 out of 100 of the fake patients (i.e. sensitivity of picking up fake with a of 0%), the diagnostic accuracy is very poor. In reality, the prior probability of having a fake patient is so low that the posterior probability of 0 out of 100 is not very surprising with a very good diagnostic test. Moreover, psychiatric diagnoses (or any medical diagnosis) is not designed to be 1) binary 2) sensitive to pick up fakes. One would hope that in the emergency room you would pick up all the real heart attacks, even if you decide to also pick up a few fake ones and subject them to more definitive diagnostic tests. In psychiatry, the definitive tests are observations from a longitudinal course that can span months.

Typically, a psychiatrist who sees someone with a complaint of psychotic symptoms like hearing voices might be a few “differentials”: a psychotic illness, a mood disorder, or something else. The likelihood of the assignment of labels are not fixed as pieces of evidence. The uncertainty strikes me as a good thing, though DSM explicitly rejects diagnostic ambiguity, perhaps because whoever created it is not use to think about diagnosis as a Bayesian inference process.

This brings us to today and Slater’s experiment. This is an important and timely discussion. Patients who fake symptoms show up to emergency rooms frequently, and many are now discharged, sometimes without a diagnosis. Because of criticisms like Rosenhan’s, more restrictive treatments like hospitalizations are much less frequently used today. Problem is, unpredictable events from these discharges occur. All the controversies come up because people become emotionally entangled in the consequences of an incorrect diagnosis, but without thinking in a mathematically rigorous way it is difficult to push for progress. What should occur is a large-scale quantitative model of the posterior probability of unlikely but consequential events (like post discharge violence), and using the model to forecast and guide interventions, as human judgments exhibit fundamental limits in making these decisions. However, it is difficult to articulate this to stakeholders, let alone to the public, because the foundational concepts such as “prior probability” are very non-intuitive, and there are genuine legal consequences that need to be worked out (i.e. Bayesian reasoning is not immediately compatible with the principle of assumed innocence in jurisprudence). I don’t have a coherent proposal but I think training policymakers in Bayesian reasoning (and, by extension, the important distinction between hypothesis testing and predictive model which I emphasized multiple times in this blog) can be helpful.