An ideal quantized mask to increase intelligibility and quality of speech in noise.

Researchers

Journal

The Journal of the Acoustical Society of America

Modalities

Models

Abstract

Time-frequency (T-F) masks represent powerful tools to increase the intelligibility of speech in background noise. Translational relevance is provided by their accurate estimation based only on the signal-plus-noise mixture, using deep learning or other machine-learning techniques. In the current study, a technique is designed to capture the benefits of existing techniques. In the ideal quantized mask (IQM), speech and noise are partitioned into T-F units, and each unit receives one of N attenuations according to its signal-to-noise ratio. It was found that as few as four to eight attenuation steps (IQM4, IQM8) improved intelligibility over the ideal binary mask (IBM, having two attenuation steps), and equaled the intelligibility resulting from the ideal ratio mask (IRM, having a theoretically infinite number of steps). Sound-quality ratings and rankings of noisy speech processed by the IQM4 and IQM8 were also superior to that processed by the IBM and equaled or exceeded that processed by the IRM. It is concluded that the intelligibility and sound-quality advantages of infinite attenuation resolution can be captured by an IQM having only a very small number of steps. Further, the classification-based nature of the IQM might provide algorithmic advantages over the regression-based IRM during machine estimation.

Show Full Text

An ideal quantized mask to increase intelligibility and quality of speech in noise.

Researchers

Journal

Modalities

Models

Abstract

Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network.

Relationships Between Vocal Fold Adduction Patterns, Vocal Acoustic Quality, and Vocal Effort in Individuals With and Without Hyperfunctional Voice Disorders.

Advances on Automatic Speech Analysis for Early Detection of Alzheimer Disease: A Non-linear Multi-task Approach.

Assessing variants of uncertain significance implicated in hearing loss using a comprehensive deafness proteome.

Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods.

Environmental Noise Classification with Inception-Dense Blocks for Hearing Aids.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply