|

Using Automatic Speech Recognition to Measure the Intelligibility of Speech Synthesized from Brain Signals.

Researchers

Journal

Modalities

Models

Abstract

Brain-computer interfaces (BCIs) can potentially restore lost function in patients with neurological injury. A promising new application of BCI technology has focused on speech restoration. One approach is to synthesize speech from the neural correlates of a person who cannot speak, as they attempt to do so. However, there is no established gold-standard for quantifying the quality of BCI-synthesized speech. Quantitative metrics, such as applying correlation coefficients between true and decoded speech, are not applicable to anarthric users and fail to capture intelligibility by actual human listeners; by contrast, methods involving people completing forced-choice multiple-choice questionnaires are imprecise, not practical at scale, and cannot be used as cost functions for improving speech decoding algorithms. Here, we present a deep learning-based “AI Listener” that can be used to evaluate BCI speech intelligibility objectively, rapidly, and automatically. We begin by adapting several leading Automatic Speech Recognition (ASR) deep learning models – Deepspeech, Wav2vec 2.0, and Kaldi – to suit our application. We then evaluate the performance of these ASRs on multiple speech datasets with varying levels of intelligibility, including: healthy speech, speech from people with dysarthria, and synthesized BCI speech. Our results demonstrate that the multiple-language ASR model XLSR-Wav2vec 2.0, trained to output phonemes, yields superior performance in terms of speech transcription accuracy. Notably, the AI Listener reports that several previously published BCI output datasets are not intelligible, which is consistent with human listeners.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *