|

Fusing traditionally extracted features with deep learned features from the speech spectrogram for anger and stress detection using convolution neural network.

Researchers

Journal

Modalities

Models

Abstract

Stress and anger are two negative emotions that affect individuals both mentally and physically; there is a need to tackle them as soon as possible. Automated systems are highly required to monitor mental states and to detect early signs of emotional health issues. In the present work convolutional neural network is proposed for anger and stress detection using handcrafted features and deep learned features from the spectrogram. The objective of using a combined feature set is gathering information from two different representations of speech signals to obtain more prominent features and to boost the accuracy of recognition. The proposed method of emotion assessment is more computationally efficient than similar approaches used for emotion assessment. The preliminary results obtained on experimental evaluation of the proposed approach on three datasets Toronto Emotional Speech Set (TESS), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Berlin Emotional Database (EMO-DB) indicate that categorical accuracy is boosted and cross-entropy loss is reduced to a considerable extent. The proposed convolutional neural network (CNN) obtains training (T) and validation (V) categorical accuracy of T = 93.7%, V = 95.6% for TESS, T = 97.5%, V = 95.6% for EMO-DB and T = 96.7%, V = 96.7% for RAVDESS dataset.© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *