作者: Zibo Meng , Shizhong Han , Min Chen , Yan Tong
DOI: 10.1109/ISM.2015.116
关键词:
摘要: Recognizing facial actions from spontaneous displays suffers subtle and complex deformation, frequent head movements, partial occlusions. It is especially challenging when the activities are accompanied with speech. Instead of employing information solely visual channel, this paper presents a novel fusion framework, which exploits both audio channels in recognizing speech-related action units (AUs). In particular, features first extracted channels, independently. Then, aligned order to handle difference time scales shift between two signals. Finally, these integrated via feature-level framework utilized AUs. Experimental results on new audiovisual AU-coded dataset have demonstrated that proposed outperforms state-of-the-art visual-based method AUs, for those AUs "invisible" channel during The improvement more impressive occlusions images, which, fortunately, would not affect channel.