Artigo Revisado por pares

Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition

2020; Elsevier BV; Volume: 57; Issue: 3 Linguagem: Inglês

10.1016/j.ipm.2019.102185

ISSN

1873-5371

Autores

Chao Li, Zhongtian Bao, Linhao Li, Ziping Zhao,

Tópico(s)

Gaze Tracking and Assistive Technology

Resumo

Emotional recognition contributes to automatically perceive the user’s emotional response to multimedia content through implicit annotation, which further benefits establishing effective user-centric services. Physiological-based ways have increasingly attract researcher’s attention because of their objectiveness on emotion representation. Conventional approaches to solve emotion recognition have mostly focused on the extraction of different kinds of hand-crafted features. However, hand-crafted feature always requires domain knowledge for the specific task, and designing the proper features may be more time consuming. Therefore, exploring the most effective physiological-based temporal feature representation for emotion recognition becomes the core problem of most works. In this paper, we proposed a multimodal attention-based BLSTM network framework for efficient emotion recognition. Firstly, raw physiological signals from each channel are transformed to spectrogram image for capturing their time and frequency information. Secondly, Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) are utilized to automatically learn the best temporal features. The learned deep features are then fed into a deep neural network (DNN) to predict the probability of emotional output for each channel. Finally, decision level fusion strategy is utilized to predict the final emotion. The experimental results on AMIGOS dataset show that our method outperforms other state of art methods.

Referência(s)