Neuroscience Breakthrough: AI Translates Thought-to-Speech

Columbia University’s Brain-Computer Interface is state of the art.

Posted Jan 30, 2019

orla/istockphoto
Source: orla/istockphoto

First there was the keyboard, then touch and voice to control computing devices and apps. What's next? Researchers at the Mortimer B. Zuckerman Mind Brain Behavior Institute at Columbia University in New York City announced “a scientific first” with their invention of a brain-computer interface (BCI) that translate human thought into speech with higher clarity and precision than existing solutions. The research team, led by Nima Mesgarani, Ph.D., published their findings on January 29, 2019 in Scientific Reports, a Nature research journal.

A brain-computer interface is a bidirectional communication route between a brain and computer. Many BCI research projects are centered on neuroprosthetic uses for those who have lost or impaired movement, vision, hearing, or speech, such as those impacted by stroke, spinal cord injuries, amyotrophic lateral sclerosis (ALS), aphasia (speech impairment due to brain damage), cochlear damage, and locked-in syndrome.

Up until this landmark breakthrough, the process for decoding brain signals used more simple computing models based on linear regression to analyze visual representations of sound frequencies (spectrograms) which produced unintelligible speech. Mesgarani and his research team combined the latest innovative technologies in speech synthesis with AI deep learning to improve the intelligibility of reconstructed speech, with significantly improved results.

Mesgarani partnered with neurosurgeon Ashesh Dinesh Mehta, MD, Ph.D., at Northwell Health Physician Partners Neuroscience Institute to measure the brain activities of pharmacoresistant focal epilepsy patients who were already undergoing brain surgery for the study.  

Invasive electrocorticography (ECoG) was used to measure neural activity of five study participants who all self-reported normal hearing abilities while they listened to four speakers present short stories for half an hour. The recorded neural patterns were used as data input to train a vocoder, an audio processor that analyzes and synthesizes human voice.

After training the vocoder, the researchers recorded brain signals of the same participants while they listened to speakers count between zero to nine. These recorded brain signals were input through the vocoder, which in turn produced synthesized speech. Next, the researchers used artificial neural networks to refine the speech produced by the vocoder, then had 11 subjects with normal hearing listen to the output.

The researchers discovered that using a deep neural network (DNN) with nonlinear regression improves the intelligibility by 67 percent over the baseline method of using linear regression to reconstruct the auditory spectrogram. Those participants could understand and repeat the sounds generated with the DNN-vocoder combination with 75 percent accuracy. According to the researchers, “the findings of studies showing the superior advantage of deep learning models over other techniques, particularly when the amount of training data is large,” and “increasing the amount of training data results in better reconstruction accuracy.”

The researchers discovered a “general framework that can be used for speech neuroprosthesis technologies that can result in accurate and intelligible reconstructed speech from the human auditory cortex.” They view their brain-to computer systems as state-of-the-art and “a step toward the next generation of human-computer interaction systems and more natural communication channels for patients suffering from paralysis and locked-in syndromes.”

The rise of artificial intelligence deep learning has created a wellspring of possible scientific advancement across disciplines—especially in the field of neuroscience and biomedical engineering. In the future, will computing devices be managed by human thought?

Copyright © 2019 Cami Rosso All rights reserved.

References

Akbari, Hassan, Khalighinejad, Bahar, Herrero, Jose L., Mehta, Ashesh D., Mesgarani, Nima.”Towards reconstructing intelligible speech from the human auditory cortex.” Scientific Reports. January 29, 2019.

More Posts