Abstract
Human emotion recognition is a research topic that is receiving continuous attention in computer vision and artificial intelligence domains. This paper proposes a method for classifying human emotions through multiple neural networks based on multi-modal signals which consist of image, landmark, and audio in a wild environment. The proposed method has the following features. First, the learning performance of the image-based network is greatly improved by employing both multi-Task learning and semi-supervised learning using the spatio-Temporal characteristic of videos. Second, a model for converting 1-dimensional (1D) landmark information of face into two-dimensional (2D) images, is newly proposed, and a CNN-LSTM network based on the model is proposed for better emotion recognition. Third, based on an observation that audio signals are often very effective for specific emotions, we propose an audio deep learning mechanism robust to the specific emotions. Finally, so-called emotion adaptive fusion is applied to enable synergy of multiple networks. In the fifth attempt on the given test set in the EmotiW2017 challenge, the proposed method achieved a classification accuracy of 57.12%.
Original language | English |
---|---|
Title of host publication | ICMI 2017 - Proceedings of the 19th ACM International Conference on Multimodal Interaction |
Editors | Edward Lank, Eve Hoggan, Sriram Subramanian, Alessandro Vinciarelli, Stephen A. Brewster |
Publisher | Association for Computing Machinery, Inc |
Pages | 529-535 |
Number of pages | 7 |
ISBN (Electronic) | 9781450355438 |
DOIs | |
State | Published - 3 Nov 2017 |
Event | 19th ACM International Conference on Multimodal Interaction, ICMI 2017 - Glasgow, United Kingdom Duration: 13 Nov 2017 → 17 Nov 2017 |
Publication series
Name | ICMI 2017 - Proceedings of the 19th ACM International Conference on Multimodal Interaction |
---|---|
Volume | 2017-January |
Conference
Conference | 19th ACM International Conference on Multimodal Interaction, ICMI 2017 |
---|---|
Country/Territory | United Kingdom |
City | Glasgow |
Period | 13/11/17 → 17/11/17 |
Bibliographical note
Publisher Copyright:© 2017 ACM.
Keywords
- EmotiW 2017 challenge
- Emotion recognition
- Multi modal signal
- Multi-Task learning
- Semi-supervised learning