A group of scientists from the department of neurological surgery, University of California, San Francisco, is trying to restore speech in patients through AI-driven speech decoder. Traditionally, Brain-computer interfaces (BCI) are used to assist patients with disrupted capabilities to speak as a result of neurological disorders like stroke, amyotrophic lateral sclerosis or paralysis. It detects and decodes brain signals which stimulates external machine to type out a person’s thought. 

The mechanisms 

However, this form of communication is inefficient and time-consuming. Often, patients are able to “converse” up to eight words per minute, as compared to an average of 150 words in fluent speech. In a recent paper published on Nature, Anumanchipalli and his research team combines BCI with deep learning to produce spoken sentences. They recruited five healthy volunteers to speak hundreds of sentences and tracked the neural signals of the areas of their brains that govern speech and articulator activities. 

The collected recordings were combined with data from previous experiments, to estimate the movements of vocal-tract articulators (i.e., tongue, lips, larynx and jaw etc.). They will be fed into a deep learning algorithm, which is attached to a decoder. This device will turn all the information into synthetic speech. Thus far, listeners are able to comprehend an average of 70% of the spoken words. 

Yet to leave the lab 

Indeed, the two-stage approach of changing brain signals into estimated movements before turning them into audible speech had created less acoustic distortion. However, other neurologists and neural scientists had questioned its feasibility in a natural setting. After all, the model was trained with limited words and speech conditions. As such, traditional methods like direct synthesis of acoustic features may still outperform it, especially in times when movements of vocal-tract articulators are minimal. 

Likewise, the need to train the model using vocalized speech means that BCI which relies on the approach may not be applicable to patients who have difficulties in making sounds. The research team had tried to address the challenges by showing that synthetic speech can still be produced with volunteers making a speech without sound but the accuracy was much lower. 

Nevertheless, the use of human subjects to produce speech data and measure the model’s intelligibility, put the study in a more favorable position for follow-up in near future. Human speech production research cannot be directly studied in animals, so its development has always been gradual. While AI may facilitate a breakthrough, there is still a need for collaboration between individuals with vast expertise, to bring a development from the lab into reality. 

Author

Andrew Johnson