MelTrans: Mel-Spectrogram Relationship-Learning for Speech Emotion Recognition via Transformers.

Abstract

Speech emotion recognition (SER) is not only a ubiquitous aspect of everyday communication, but also a central focus in the field of human-computer interaction. However, SER faces several challenges, including difficulties in detecting subtle emotional nuances and the complicated task of recognizing speech emotions in noisy environments. To effectively address these challenges, we introduce a Transformer-based model called MelTrans, which is designed to distill critical clues from speech data by learning core features and long-range dependencies. At the heart of our approach is a dual-stream framework. Using the Transformer architecture as its foundation, MelTrans deciphers broad dependencies within speech mel-spectrograms, facilitating a nuanced understanding of emotional cues embedded in speech signals. Comprehensive experimental evaluations on the EmoDB (92.52%) and IEMOCAP (76.54%) datasets demonstrate the effectiveness of MelTrans. These results highlight MelTrans’s ability to capture critical cues and long-range dependencies in speech data, setting a new benchmark within the context of these specific datasets. These results highlight the effectiveness of the proposed model in addressing the complex challenges posed by SER tasks.

Show Full Text

MelTrans: Mel-Spectrogram Relationship-Learning for Speech Emotion Recognition via Transformers.

Researchers

Journal

Modalities

Models

Abstract

A lightweight attention deep learning method for human-vehicle recognition based on wireless sensing technology.

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora.

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.

Standing-Posture Recognition in Human-Robot Collaboration Based on Deep Learning and the Dempster-Shafer Evidence Theory.

CT synthesis from MR in the pelvic area using Residual Transformer Conditional GAN.

EMG-FRNet: A feature reconstruction network for EMG irrelevant gesture recognition.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply