Are Academic Articles Getting Harder to Read?

A new study suggests they are.

Posted Apr 22, 2017

Presidential speeches, news stories, and novels all rely on simpler sentences and language than they did decades ago. In contrast, academic articles have become increasingly more challenging to read. A new study, published online in bioRxiv charts the decreasing readability of scientific papers in particular. This study measured the readability of over 700,000 scientific abstracts published between 1881 and 2015 from 122 well-regarded biomedical journals. Its authors not only document the decreasing readability of scientific articles but also voice concern over impacts on reproducing scientific results to validate important findings and on the accessibility of studies to researchers.

An editorial in the 30 March issue of Nature concurs, but with some important caveats. First, the sciences are hardly alone in the increasing impenetrability of academic articles. In fact, humanities journals may have less readable studies than scientific journals, where, to some extent, using a word like exon is a necessity, not a choice to fall back on jargon. Second, in general, abstracts receive the least amount of attention from writers and employ more jargon, likely in an attempt to appeal to journals’ gate-keepers, who read the abstract as a separate submission from a full paper. As a result, abstracts disproportionately feature long sentences, embedded phrases and clauses, and the jargon that the authors of the bioRxiv article abhor. I recently braced myself to spend a full day offering editorial suggestions on a draft of an article from a colleague who usually writes with wit and bracing prose—after reading an abstract heaving with jargon and the stilted journal-speak that can make reading journal articles an exercise in stylistic torture. But, once I cleared the abstract’s final sentence and plunged on to the article itself, the jargon and tortuous sentences evaporated, replaced with concrete, reader-friendly prose.

The bioRxiv study also had a key limitation: it relied entirely on two formulas for readability: Flesch Reading Ease and the New Dale-Chall Readability Formula. Flesch uses a complex formula that relies on measuring the number of syllables per word and the number of words per sentence. Similarly, New Dale-Chall measures the number of words in a sentence and the percentage of “difficult words.” Both formulas set out to assess which texts where appropriate for primary school learners and also gained widespread use as a mechanism for assessing primary school writers’ development. But even in these settings, the formulas, while daunting to represent [Dale-Chall reads, “Raw Score = 0.1579 (percentage of difficult words) + 0.0496 (average sentence length in words)] fail to distinguish the challenges inherent in a 25-word sentence with 11 embedded phrases and clauses from a 25-word sentence containing only two clauses. Moreover, Flesch assigns the same score to a sentence containing a three-syllable word like revanchist to one containing basketball. We know which sentence will take readers longer to digest.

At the same time, New Dale-Chall redressed the shortcomings of the original Dale-Chall 763-word list of “familiar” words by expanding that list to 3000 words. Note that this list represents words that 80% of American fifth graders can easily comprehend. As a result, researchers would be hard-pressed to identify a single sentence in most scientific papers that relied strictly on a fifth grader’s easy vocabulary items.

Instead, the researchers should have turned to a software platform like Lexile, that draws on a corpus of over 1 billion words to assign scores of ease or difficulty to words based on the frequency with which they appear in English publications, books, and websites. However, researchers would also need to measure other indicators of complexity, including sentences’ structural features, such as the length and number of phrases and clauses per sentence, like the one created by Haiyang Ai (2010).

While academic articles may be getting more difficult to read, counting syllables and using a fifth graders' vocabulary to assess readability fails to tell us how a 2017 article in PLoS One differs in difficulty from an 1881 article in Nature.