AI Researchers Decode Speech From Thoughts Without Brain Surgery
AI decodes speech from brain activity using noninvasive recordings.
Posted September 13, 2022 | Reviewed by Kaja Perina
The advanced pattern-recognition capabilities of AI deep learning offer new glimmer of hope of a noninvasive solution to help patients who are unable to speak due to neurodegenerative diseases or injury to the brain or spinal cord. A new study by researchers affiliated with Meta AI show how artificial intelligence deep learning can decode speech from noninvasive recordings of brain activity, a step forward in providing an alternative to solutions that require invasive open brain surgery to implant brain-computer interface devices.
“Decoding language from brain activity is a long-awaited goal in both healthcare and neuroscience,” wrote the research team of Alexandre Défossez, Charlotte Caucheteux, Jérémy Rapin, Ori Kabeli, and Jean-Rémi King. “Major milestones have recently been reached thanks to intracranial devices: subject-specific pipelines trained on invasive brain responses to basic language tasks now start to efficiently decode interpretable features (e.g. letters, words, spectrograms). However, scaling this approach to natural speech and non-invasive brain recordings remains a major challenge.”
To address this challenge, the Meta AI researchers used AI deep learning, specifically a convolutional neural network (CNN), to help decode brain activity using data captured without require open brain surgery. The deep learning algorithm was an open-source pretrained self-supervised model called wav2vec 2.0 that was developed in 2020 by the Facebook (now Meta) AI team of Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli.
The study gathered data from 169 healthy participants who were passively listen to audiobooks and sentences in their native languages of either English or Dutch while their brain activity was recorded noninvasively with either magnetoencephalography (MEG) or electroencephalography (EEG). This data was then used as input into an AI model to search for patterns in the high-dimensional data. The goal is for the AI to predict what the study participants were listening to from the noninvasive brain scans of their neural activity.
The researchers found that the algorithm performed better with the MEG datasets than the EEG datasets. For the MEG datasets, the model predicted a top-10 accuracy of up to 72.5 percent from three seconds of brain activity out of over 1,590 distinct segments. The AI algorithm outperformed the random baseline in decoding EEG datasets, but only reached up to 19.1 percent out of over 2,600 segments.
As for the societal impact, the Meta AI researcher caution, “Although these results hold great promise for the development of a safe and scalable system to help patients with communication deficits, the scientific community should remain vigilant that it will not be adapted to decode brain signals without the consent of the participants.”
The AI researchers also point out that unlike other biomarkers such as facial features, DNA, and fingerprints brain activity recordings from EEG and MEG could not be collected without a participant knowing about it.
Copyright © 2022 Cami Rosso All rights reserved.