Artigo Revisado por pares

Improving the accuracy of speech emotion recognition using acoustic landmarks and Teager energy operator features

2015; Acoustical Society of America; Volume: 137; Issue: 4_Supplement Linguagem: Inglês

10.1121/1.4920410

ISSN

1520-9024

Autores

Reza Asadı, Harriet Fell,

Tópico(s)

Emotion and Mood Recognition

Resumo

Affective computing can help us achieve more intelligent user interfaces by adding the ability to recognize users’ emotions. Human speech contains information about the emotional state of the speaker and can be used in emotion recognition systems. In this paper, we present a machine learning approach using acoustic features which improves the accuracy of speech emotion recognition. We used 698 speech samples from ”Emotional Prosody Speech and transcripts” corpus to train and test the classifiers. The emotions used were happy, sadness, hot anger, panic, and neutral. Mel-frequency Cepstral Coefficients (MFCC), Teager Energy Operator (TEO) features, and acoustic landmark features were extracted from speech samples. Models were trained using multinomial logistic regression, k-Nearest Neighbors(k-NN) and Support Vector Machine(SVM) classifiers. The results show that adding landmark and TEO features to MFCC features improves the accuracy of classification. SVM classifiers with a Gaussian kernel had the best performance with an average accuracy of 90.43%. We achieved significant improvement in the accuracy of the classification compared to a previous study using the same dataset.

Referência(s)