Robust Speech Emotion Recognition: A Convolutional Neural Network Approach |
Paper ID : 1100-ICEEM2025 (R1) |
Authors |
Niveen Hassan *, Adel El.fishawy, Fathy El.sayed, Mohamed Arafa Department of Electronics and Electrical Communications Engineering |
Abstract |
Speech Emotion Recognition (SER) is the process of recognizing human emotion and affective states from speech. This is based on the finding that voice pitch and tone typically reveal underlying emotions. Emotion recognition is now highly demanded and growing in popularity. This paper aims to utilize a Convolutional Neural Network (CNN) to separate emotions from audio recordings and categorize them according to the spectrum of different emotions, using modalities, emotions, intensities, repetitions, and other relevant characteristics included in the data. We have utilized a deep learning technique to create a model that can recognize emotions from audio samples. In order to properly set up and train the deep learning framework, the main goal is to evaluate the influence and cross-relation of all input and output parameters. We have evaluated emotions using a specialized deep learning algorithm. The experimental evaluation of the proposed model depends on two databases: RAVDESS and TESS. The evaluation of the proposed CNN model is primarily focused on precision, recall, and F1 score. Additionally, the data augmentation method applied in this paper proves beneficial in improving the recognition accuracy and stability of the systems with different databases. Loss and Accuracy variations with epochs beside confusion matrix of the model for the two datasets are investigated. Furthermore, according to the experimental results, the proposed system achieves superior recognition rates compared to related research in speech emotion recognition. |
Keywords |
Deep learning models, Mel frequency cepstral coefficients, RAVDESS, TESS, Data Augmentation, Convolutional neural networks |
Status: Accepted |