Audiovisual Facial Action Unit Recognition using Feature Level Fusion

作者: Zibo Meng , Shizhong Han , Min Chen , Yan Tong

DOI: 10.4018/IJMDEM.2016010104

关键词:

摘要: Recognizing facial actions is challenging, especially when they are accompanied with speech. Instead of employing information solely from the visual channel, this work aims to exploit information from both visual and audio channels in recognizing speech-related facial action units (AUs). In this work, two feature-level fusion methods are proposed. The first method is based on a kind of human-crafted visual feature. The other method utilizes visual features learned by a deep convolutional neural network (CNN). For both methods, features are independently extracted from visual and audio channels and aligned to handle the difference in time scales and the time shift between the two signals. These temporally aligned features are integrated via feature-level fusion for AU recognition. Experimental results on a new audiovisual AU-coded dataset have demonstrated that both fusion methods outperform their visual counterparts in recognizing speech-related AUs. The improvement is more impressive with occlusions on the facial images, which would not affect the audio channel.

参考文章(50)
Salah Rifai, Yoshua Bengio, Aaron Courville, Pascal Vincent, Mehdi Mirza, Disentangling factors of variation for facial expression recognition european conference on computer vision. pp. 808- 822 ,(2012) , 10.1007/978-3-642-33783-3_58
Zi-Lu Ying, Zhe-Wei Wang, Ming-Wei Huang, Facial expression recognition based on fusion of sparse representation international conference on intelligent computing. pp. 457- 464 ,(2010) , 10.1007/978-3-642-14932-0_57
Petr Motlicek, Georg Stemmer, Ondrej Glembek, Karel Vesely, Lukas Burget, Gilles Boulianne, Yanmin Qian, Mirko Hannemann, Nagendra Goel, Petr Schwarz, Arnab Ghoshal, Jan Silovsky, Daniel Povey, The Kaldi Speech Recognition Toolkit ieee automatic speech recognition and understanding workshop. ,(2011)
Michel F. Valstar, Timur Almaev, Jeffrey M. Girard, Gary McKeown, Marc Mehu, Lijun Yin, Maja Pantic, Jeffrey F. Cohn, FERA 2015 - second Facial Expression Recognition and Analysis challenge ieee international conference on automatic face gesture recognition. ,vol. 06, pp. 1- 8 ,(2015) , 10.1109/FG.2015.7284874
Amogh Gudi, H. Emrah Tasli, Tim M. den Uyl, Andreas Maroulis, Deep learning based FACS Action Unit occurrence and intensity estimation ieee international conference on automatic face gesture recognition. ,vol. 06, pp. 1- 5 ,(2015) , 10.1109/FG.2015.7284873
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification international conference on computer vision. pp. 1026- 1034 ,(2015) , 10.1109/ICCV.2015.123
Shaoxin Li, Junliang Xing, Zhiheng Niu, Shiguang Shan, Shuicheng Yan, Shape driven kernel adaptation in Convolutional Neural Network for robust facial trait recognition computer vision and pattern recognition. pp. 222- 230 ,(2015) , 10.1109/CVPR.2015.7298618
Evangelos Sariyanidi, Hatice Gunes, Andrea Cavallaro, Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 37, pp. 1113- 1133 ,(2015) , 10.1109/TPAMI.2014.2366127
T. Senechal, V. Rapp, H. Salam, R. Seguier, K. Bailly, L. Prevost, Facial Action Recognition Combining Heterogeneous Features via Multikernel Learning systems man and cybernetics. ,vol. 42, pp. 993- 1005 ,(2012) , 10.1109/TSMCB.2012.2193567
Ping Liu, Shizhong Han, Zibo Meng, Yan Tong, Facial Expression Recognition via a Boosted Deep Belief Network computer vision and pattern recognition. pp. 1805- 1812 ,(2014) , 10.1109/CVPR.2014.233