In Studies of Forensic Errors, the Devil Is in the Details
A new study claims forensic errors are rare—but the data tell a different story.
Posted August 26, 2022 | Reviewed by Ekua Hagan
- A new study concluded that forensic handwriting examiners wrongly implicate innocent suspects only 3.1 percent of the time.
- A closer look at this study suggests that the real-world error rate is likely much higher.
- Journalists and researchers must work together to ensure that scientific findings are accurately communicated to the public.
Every day, forensic examiners compare pieces of evidence (e.g., fingerprints, bullets) and decide whether they “match”—i.e., came from the same source (e.g., person, gun). These decisions hold obvious legal implications, and although the public believes that errors are very rare, the unfortunate reality is that we simply don’t know how often forensic science errors occur. Thankfully the tide is turning, as government agencies have called for more “black box” studies to measure the error rates of widely used forensic methods.
Earlier this month, I came across a popular article titled “Forensic Experts Are Surprisingly Good at Telling Whether Two Writing Samples Match.” The article described a new study of forensic handwriting comparison which, according to the teaser, showed that “it is effective if an examiner has the right training.” It went on to explain that examiners in this study committed false positive errors—i.e., misjudged non-matching handwriting samples as a match, thereby wrongly implicating an innocent person—only 3.1 percent of the time.
That seemed like good news—but was it too good to be true? I decided to go straight to the source and read the research article itself. Sure enough, the very first statistic reported in the researchers’ own summary of their findings is that examiners made false positive errors only 3.1 percent of the time. But as I kept reading, I noticed more and more reasons to doubt that number.
Where Did 3.1 Percent Come From?
In the study, 86 forensic handwriting examiners each analyzed 100 sets of handwriting samples and decided whether each set did or did not “match.” Because the researchers knew whether each set was actually written by the same person), they could then calculate the number and types of errors that examiners made.
In total, the 86 examiners in this study made 6,576 unique judgments, including 2,863 judgments of matching sets and 3,713 judgments of non-matching sets. Of the latter, examiners incorrectly judged non-matching samples as having been written by the same person 114 times—hence a false positive error rate of 3.1 percent (114/3,713).
Deciding Not to Decide
You may be wondering: With 86 examiners and 100 handwriting sets, shouldn’t there be 8,600 total judgments? But because examiners weren’t required to judge all 100 sets, some simply chose not to. In fact, only 45 examiners judged all 100 sets, while 16 examiners answered fewer than half of the sets.
We cannot know why some examiners skipped certain sets; perhaps they were too busy or experienced technical difficulties. But if examiners tended to avoid sets that were especially difficult—which is a luxury they don’t have in the real world—then the study results would overestimate their true ability. By analogy, imagine if your SAT score were based only on the questions that you chose to answer; some people would answer the easier questions, skip the harder ones, and receive unrealistically high scores.
Examiners in this study could also judge handwriting sets as “inconclusive”—an option they do have in the real world—if they felt there wasn’t enough information to justify a decision in either direction. Of the 3,713 total judgments of non-matching sets, 547 were inconclusive judgments.
Researchers have debated when, if ever, inconclusive judgments should be considered correct—but this study effectively counted them as always correct by including them in the denominator of the error rate calculation. That is to say, if an examiner selected “inconclusive” 100 times, they would have a 0 percent error rate (0/100). But in order to know how often examiners’ decisions were incorrect, we must remove the inconclusive judgments from that calculation, which would increase the error rate to 3.6 percent (114/3,166).
A Matter of Degree?
But it gets much worse. When examiners made a decision in either direction, they also indicated whether they felt that their decision was “certainly” or “probably” correct. Importantly, decisions were only considered errors if the examiner was both incorrect and certain, which happened 114 times. But if an examiner judged a non-matching set as probably a match, it was not considered an error—and that happened an additional 147 times.
However, jurors do not make this distinction. In one recent study, for example, mock jurors were equally persuaded by the testimony of a firearms examiner regardless of whether he described his opinion as “certain” or “more likely than not.” Therefore, in practice, “probable” but incorrect decisions are no less harmful than certain ones—and including these raises the false positive error rate to a staggering 8.2 percent (261/3,166).
All told, a closer look at this study tells a very different story than did the article that led me to it. Looking only at cases where examiners chose to render a decision, they wrongly implicated an innocent person 8.2 percent of the time—and, like other forensic validation studies, those judgments were made under favorable conditions (e.g., lengthy samples, no time pressure), so the real-world error rate may be even higher. Moreover, these same researchers have since published a second “black box” study of footwear comparisons that raises similar concerns.
I do not mean to denigrate these researchers; their work is critical to improving forensic science, and this study was impressively ambitious, diligent, and transparent. Nor do I mean to denigrate the journalist who reported on it. Rather, my goal here is to highlight the value—but also difficulty—of clear and accurate scientific communication.
It is often tempting to reduce the findings of a complex study to a single buzzworthy—even if misleading—number. Journalists should be mindful that paywalls and scientific illiteracy prevent many readers from critiquing the original research for themselves, so their accounts are likely to be taken as fact. To help with this, researchers must also work with journalists to ensure that their reporting captures the nuance of their work. The devil may be in the details, but so too is the truth.