Skip to main content

Verified by Psychology Today

Artificial Intelligence

Neuroscience Breakthrough: AI Translates Thought-to-Speech

Researchers develop a system that translates brain signals into speech using AI.

orla/istockphoto
Source: orla/istockphoto

First there was the keyboard, then touch and voice to control computing devices and apps. What's next? Researchers at the Mortimer B. Zuckerman Mind Brain Behavior Institute at Columbia University in New York City announced “a scientific first” with their development of a brain-computer interface (BCI) that translates human thought into speech with higher clarity and precision than existing solutions. The research team, led by Nima Mesgarani, Ph.D., published its findings on January 29, 2019 in Scientific Reports, a Nature research journal.

A brain-computer interface is a bidirectional communication route between a brain and computer. Many BCI research projects are centered on neuroprosthetic uses for people who have lost or impaired movement, vision, hearing, or speech, such as those impacted by stroke, spinal cord injuries, amyotrophic lateral sclerosis (ALS), aphasia (speech impairment due to brain damage), cochlear damage, and locked-in syndrome.

Up until this breakthrough, the process for decoding brain signals used more simple computing models based on linear regression to analyze visual representations of sound frequencies (spectrograms), which produced unintelligible speech. Mesgarani and his research team combined the latest technologies in speech synthesis with AI deep learning to improve the intelligibility of reconstructed speech, with significantly improved results.

Mesgarani partnered with neurosurgeon Ashesh Dinesh Mehta, MD, Ph.D., at Northwell Health Physician Partners Neuroscience Institute to measure the brain activity of pharmacoresistant focal epilepsy patients who were already undergoing brain surgery for the study.

Invasive electrocorticography (ECoG) was used to measure neural activity of five study participants who all self-reported normal hearing abilities while they listened to four speakers present short stories for half an hour. The recorded neural patterns were used as data input to train a vocoder, an audio processor that analyzes and synthesizes the human voice.

After training the vocoder, the researchers recorded brain signals from the same participants while they listened to speakers count from zero to nine. These recorded brain signals were input through the vocoder, which in turn produced synthesized speech. Next, the researchers used artificial neural networks to refine the speech produced by the vocoder, then had 11 subjects with normal hearing listen to the output.

The researchers discovered that using a deep neural network (DNN) with nonlinear regression improved the intelligibility by 67 percent over the baseline method of using linear regression to reconstruct the auditory spectrogram. Those participants could understand and repeat the sounds generated with the DNN-vocoder combination with 75 percent accuracy.

The researchers found a “general framework that can be used for speech neuroprosthesis technologies that can result in accurate and intelligible reconstructed speech from the human auditory cortex.” They view their research as “a step toward the next generation of human-computer interaction systems and more natural communication channels for patients suffering from paralysis and locked-in syndromes.”

Copyright © 2019 Cami Rosso All rights reserved.

References

Akbari, Hassan, Khalighinejad, Bahar, Herrero, Jose L., Mehta, Ashesh D., Mesgarani, Nima.”Towards reconstructing intelligible speech from the human auditory cortex.” Scientific Reports. January 29, 2019.

advertisement
More from Cami Rosso
More from Psychology Today
More from Cami Rosso
More from Psychology Today