Semi-Coupled Hidden Markov Model with State-Based Alignment Strategy for Audio-Visual Emotion Recognition

作者: Jen-Chun Lin , Chung-Hsien Wu , Wen-Li Wei

DOI: 10.1007/978-3-642-24600-5_22

关键词:

摘要: This paper presents an approach to bi-modal emotion recognition based on a semi-coupled hidden Markov model (SC-HMM). A simplified state-based alignment strategy in SC-HMM is proposed align the temporal relation of states between audio and visual streams. Based this strategy, can alleviate problem data sparseness achieve better statistical dependency HMMs most real world scenarios. For performance evaluation, audio-visual signals with four emotional (happy, neutral, angry sad) were collected. Each invited seven subjects was asked utter 30 types sentences twice generate speech facial expression for each emotion. Experimental results show outperforms other fusion-based methods.

参考文章(21)
Chung-Hsien Wu, Jui-Feng Yeh, Ze-Jing Chuang, Emotion Perception and Recognition from Speech Affective Information Processing. pp. 93- 110 ,(2009) , 10.1007/978-1-84800-306-4_6
P. Boersma, Praat, a system for doing phonetics by computer Glot International. ,vol. 5, pp. 341- 345 ,(2002)
Jonathan Gratch, Editorial: State of the Journal IEEE Transactions on Affective Computing. ,vol. 3, pp. 1- 1 ,(2011) , 10.1109/T-AFFC.2011.11
Ara V. Nefian, Luhong Liang, Xiaobo Pi, Xiaoxing Liu, Kevin Murphy, Dynamic Bayesian Networks for Audio-Visual Speech Recognition EURASIP Journal on Advances in Signal Processing. ,vol. 2002, pp. 1274- 1288 ,(2002) , 10.1155/S1110865702206083
Mingli Song, Mingyu You, Na Li, Chun Chen, A robust multimodal approach for emotion recognition Neurocomputing. ,vol. 71, pp. 1913- 1920 ,(2008) , 10.1016/J.NEUCOM.2007.07.041
Lei Xie, Zhi-Qiang Liu, A coupled HMM approach to video-realistic speech animation Pattern Recognition. ,vol. 40, pp. 2325- 2340 ,(2007) , 10.1016/J.PATCOG.2006.12.001
Nalini Ambady, Robert Rosenthal, Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychological Bulletin. ,vol. 111, pp. 256- 274 ,(1992) , 10.1037/0033-2909.111.2.256
Bjöern Schuller, Ronald Müeller, Benedikt Höernler, Anja Höethker, Hitoshi Konosu, Gerhard Rigoll, Audiovisual recognition of spontaneous interest within conversations Proceedings of the ninth international conference on Multimodal interfaces - ICMI '07. pp. 30- 37 ,(2007) , 10.1145/1322192.1322201
Chung-Hsien Wu, Ze-Jing Chuang, Yu-Chung Lin, Emotion recognition from text using semantic labels and separable mixture models ACM Transactions on Asian Language Information Processing. ,vol. 5, pp. 165- 183 ,(2006) , 10.1145/1165255.1165259
Y.-I. Tian, T. Kanade, J.F. Cohn, Recognizing action units for facial expression analysis IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 23, pp. 32- 66 ,(2001) , 10.1109/34.908962