AI Translates Human Brain Signals to Text

UCSF neuroscientists' brain-computer interface decodes at natural-speech rates.

Posted Apr 05, 2020

Geralt/Pixabay
Source: Geralt/Pixabay

Can technology and neuroscience one day enable people to silently type using only the mind? Now scientists are one step closer towards a computer interface driven by human thoughts. Neuroscientists at the University of California, San Francisco (UCSF) published a study last week in Nature Neuroscience that shows how their brain-computer interface (BCI) is able to translate human brain activity into text with relatively high accuracy and at natural-speech rates using artificial intelligence (AI) machine learning.

Neuroscience researchers Edward Chang, David Moses and Joseph Makin at the UCSF Center for Integrative Neuroscience, and Department of Neurological Surgery conducted their breakthrough study with funding in part from Facebook Reality Labs. Three years ago, Facebook announced at F8, an annual developer event focused on the future of technology, its initiatives in developing brain-computer interfaces via supporting a team of UCSF researchers who aim to help patients with brain damage to communicate. Ultimately, Facebook’s vision is to create a wearable device that non-invasively enables people to type by imagining themselves talking.  

To achieve their recent breakthrough, the UCSF researchers used the approach of decoding a sentence at a time, similar to how modern machine translating algorithms work. To test their hypothesis, they trained a model using brain signals from electrocorticograms (ECoGs) during speech production and transcriptions of the corresponding spoken sentences. They used a restricted language limited to 30-50 unique sentences.

The participants of the study were four consenting patients at UCSF Medical Center who were already undergoing treatment for epilepsy and under clinical monitoring for seizures. The participants read sentences that were displayed on a computer screen out loud. Two participants read sentences from a set with picture descriptions with 30 sentences and around 125 unique words, the remaining two read sentences from blocks of 50 (or 60 words in the final block) from the MOCHA-TIMIT dataset which has 460 sentences and 1800 unique words.

As the participants read out loud, their brain activity was recorded using ECoG arrays of 120–250 electrodes that were surgically implanted on each patient’s cortical surface. Specifically, three participants were implanted with 256-channel grids over perisylvian cortices, and one participant with a 128-channel grid located dorsal to the Sylvian fissure.

The ECoG array provided input data to the encoder-decoder style artificial neural network (ANN). The artificial neural network processed the sequences in three phases.

In the first phase, the ANN learns temporal convolutional filters to downsample the signals from the ECoG data. The reason why this is done is to potentially address the limitation of a feed-forward network that may arise with similar features that might occur at different points in the sequence of ECoG data. The filter produces a hundred feature sequences.

In the next phase, these sequences are passed to the encoder recurrent neural network (RNN) which learns to summarize the sequences in a final hidden state and provides a high-dimensional encoding of the entire sequence.

In the last phase, the high-dimensional state produced by the encoder RNN is transformed by a decoder recurrent neural network. This second recurrent neural network learns to predict the next word in a sequence.

Overall, the neural network is trained in a manner where the encoder’s output values are near the target mel frequency cepstral coefficient (MFCC) while at the same time, the decoder assigns high probability to each target word. Training is done using stochastic gradient descent through backpropagation.

The researchers reported that their system had achieved higher accuracy rates than other existing brain-machine interfaces. The UCSF neuroscientists reported that with their technique, speech could be decoded from ECoG data with word error rates as low as three percent on datasets with 250-word vocabularies. According to the USCF researchers, other existing brain-machine interfaces that were limited to “decoding correctly less than 40% of words.” According to the researchers, what sets this solution apart is that their neural network has learned to “identify words, not just sentences, from ECoG data, and therefore that generalization to decoding of novel sentences is possible.”

Today information is transmitted to computing devices by speech, touch screens and keyboard. Will smartphones and other computing devices one day be guided by thinking, versus typing, finger touches or speaking? Through the interdisciplinary combination of neuroscience and artificial intelligence machine learning, scientists are further along in developing technologies that not only may help those with locked-in syndrome and speech disabilities, but also transform how we all interact and engage with smartphones and computing devices in the not-so-distant future.

Copyright © 2020 Cami Rosso All rights reserved.