Artificial Intelligence

AI Could Help Predict Alzheimer’s Disease Early Using Language

IBM and Pfizer researchers develop a machine learning linguistic AD biomarker.

Posted October 22, 2020 | Reviewed by Matt Huston

Source: Tiramisu/Pixabay

Can linguistics, the scientific study of language, be used to detect early signs of Alzheimer’s disease (AD) using artificial intelligence (AI)? Scientists from IBM Research and Pfizer created a novel machine learning model that can help predict the onset of Alzheimer’s disease ahead of a diagnosis based on linguistics, and published the results today in EClinicalMedicine, a journal by The Lancet.

“The results suggest that language performance in naturalistic probes expose subtle early signs of progression to AD in advance of clinical diagnosis of impairment,” reported the scientists.

There is a pressing need for inexpensive, accurate markers for early detection of Alzheimer’s disease, a fatal disease with no cure. Alzheimer’s disease affects roughly 5.8 million Americans, two thirds of which are women, according to the Alzheimer’s Association 2020 Alzheimer’s disease Facts and Figures report. By 2050, around 14 million Americans will be living with Alzheimer’s disease, and the estimated cost is projected to reach $1.1 trillion, per the same report.

The researchers at IBM Research and Pfizer demonstrated proof-of-concept of the strong predictive capabilities of a machine learning model that uses linguistics as a marker for early detection of Alzheimer’s disease. “The mean time to diagnosis of mild AD was 7.59 years,” wrote the researchers.

Linguistics includes the subfields of psycholinguistics (the psychology of language), meaning (semantics, pragmatics), structure (syntax, morphology), sound (phonology, phonetics), and historical linguistics (the study of languages over time). For this study, the researchers examined linguistic variables related to psycholinguistics, verbosity, lexical richness, repetitiveness, punctuation, spelling, word sequences, and complexity of both syntax and semantics.

Training a machine learning algorithm requires data. What sets this study apart is that the machine learning algorithm predicts future onset of Alzheimer’s disease using data that were collected from cognitively healthy individuals.

The research team of Melissa Naylor, Guillermo Cecchi, Mar Santamaria, Sachin Mathur, and Elif Eyigoz distilled 87 linguistic variables for their AI model. They extracted linguistic elements from written responses to neuropsychological tests. Specifically, the researchers used variables from the screening phase of an early-intervention trial from the Framingham Heart Study (FHS), a longitudinal study that started in 1948 with a large cohort. Participants in the Framingham Heart Study were given neuropsychological tests, including the Boston Aphasia Diagnostic Examination with the cookie-theft picture description task. The Boston Aphasia Diagnostic Examination is a widely used cognitive test used to assess aphasia, a disorder that impairs speech and communication abilities, and increasingly for dementia as well.

From the more than 1,200 participants of the Framingham Heart Study, about 480 were reviewed by a panel for dementia status. From this pool, 80 participants were used for testing data. Forty of these participants developed Alzheimer’s disease symptoms before 85 years of age, and the others did not. All of the samples in the test data set were collected during the cognitively normal period.

The machine learning model could predict Alzheimer’s disease with 70 percent accuracy when using linguistic variables. “Our results demonstrate that it is possible to predict future onset of Alzheimer’s disease using language samples obtained from cognitively normal individuals,” the researchers wrote.

The researchers also ran predictive experiments using non-linguistic variables as well as a combination of linguistic and non-linguistic ones. The non-linguistic variables included gender, age, education, diabetes, hypertension, APOE E4 allele (apolipoprotein E4), and neuropsychological tests (NP). Apolipoprotein E4 has been associated with increased risk of developing Alzheimer’s after age 65.

“Moreover, we showed that using linguistic variables from a single administration of the cookie-theft picture description task performed better than predictive models that incorporated APOE, demographic variables, and NP test results.” In other words, the linguistic-only machine learning model's predictive capabilities outperformed both the non-linguistic variables alone and the combination of linguistic with non-linguistic features.

Currently, diagnostic biomarkers for Alzheimer’s disease often involve cumbersome and time-consuming medical tests. For example, diagnostic testing for AD may include magnetic resonance imaging (MRI) or positron emission tomography (PET) scans, as well as invasive draws of cerebral spinal fluid or blood. With this innovative approach harnessing the predictive power of machine learning, scientists have opened the door to the possibility of non-invasive, easy-to-administer diagnostic tests based on linguistics for early detection of Alzheimer’s disease in the future.

Artificial Intelligence Essential Reads

The Emergence of Private LLMs

4 Surprising Ways AI Poses a Threat to Humanity