Feature Level Fusion for Bimodal Facial Action Unit Recognition

作者: Zibo Meng , Shizhong Han , Min Chen , Yan Tong

DOI: 10.1109/ISM.2015.116

关键词:

摘要: Recognizing facial actions from spontaneous displays suffers subtle and complex deformation, frequent head movements, partial occlusions. It is especially challenging when the activities are accompanied with speech. Instead of employing information solely visual channel, this paper presents a novel fusion framework, which exploits both audio channels in recognizing speech-related action units (AUs). In particular, features first extracted channels, independently. Then, aligned order to handle difference time scales shift between two signals. Finally, these integrated via feature-level framework utilized AUs. Experimental results on new audiovisual AU-coded dataset have demonstrated that proposed outperforms state-of-the-art visual-based method AUs, for those AUs "invisible" channel during The improvement more impressive occlusions images, which, fortunately, would not affect channel.

参考文章(21)
Petr Motlicek, Georg Stemmer, Ondrej Glembek, Karel Vesely, Lukas Burget, Gilles Boulianne, Yanmin Qian, Mirko Hannemann, Nagendra Goel, Petr Schwarz, Arnab Ghoshal, Jan Silovsky, Daniel Povey, The Kaldi Speech Recognition Toolkit ieee automatic speech recognition and understanding workshop. ,(2011)
Michel F. Valstar, Timur Almaev, Jeffrey M. Girard, Gary McKeown, Marc Mehu, Lijun Yin, Maja Pantic, Jeffrey F. Cohn, FERA 2015 - second Facial Expression Recognition and Analysis challenge ieee international conference on automatic face gesture recognition. ,vol. 06, pp. 1- 8 ,(2015) , 10.1109/FG.2015.7284874
P. Boersma, Praat, a system for doing phonetics by computer Glot International. ,vol. 5, pp. 341- 345 ,(2002)
Evangelos Sariyanidi, Hatice Gunes, Andrea Cavallaro, Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 37, pp. 1113- 1133 ,(2015) , 10.1109/TPAMI.2014.2366127
T. Senechal, V. Rapp, H. Salam, R. Seguier, K. Bailly, L. Prevost, Facial Action Recognition Combining Heterogeneous Features via Multikernel Learning systems man and cybernetics. ,vol. 42, pp. 993- 1005 ,(2012) , 10.1109/TSMCB.2012.2193567
M. F. Valstar, M. Mehu, Bihan Jiang, M. Pantic, K. Scherer, Meta-Analysis of the First Facial Expression Recognition Challenge systems man and cybernetics. ,vol. 42, pp. 966- 979 ,(2012) , 10.1109/TSMCB.2012.2200675
Jing Huang, Brian Kingsbury, Audio-visual deep learning for noise robust speech recognition international conference on acoustics, speech, and signal processing. pp. 7596- 7599 ,(2013) , 10.1109/ICASSP.2013.6639140
Jiahong Yuan, Mark Liberman, Speaker identification on the SCOTUS corpus Journal of the Acoustical Society of America. ,vol. 123, pp. 3878- 3878 ,(2008) , 10.1121/1.2935783
C. Y. Fook, M. Hariharan, Sazali Yaacob, AH Adom, A review: Malay speech recognition and audio visual speech recognition international conference on biomedical engineering. pp. 479- 484 ,(2012) , 10.1109/ICOBE.2012.6179063
Shizhong Han, Zibo Meng, Ping Liu, Yan Tong, Facial grid transformation: A novel face registration approach for improving facial action unit recognition 2014 IEEE International Conference on Image Processing (ICIP). pp. 1415- 1419 ,(2014) , 10.1109/ICIP.2014.7025283