Tensor Fusion Network for Multimodal Sentiment Analysis

作者: Soujanya Poria , Louis-Philippe Morency , Amir Zadeh , Minghai Chen , Erik Cambria

DOI:

关键词: Speech recognitionArtificial intelligenceComputer scienceSpoken languageGestureNatural language processingSentiment analysisTensor (intrinsic definition)Dynamics (music)

摘要: Multimodal sentiment analysis is an increasingly popular research area, which extends the conventional language-based definition of to a multimodal setup where other relevant modalities accompany language. In this paper, we pose problem as modeling intra-modality and inter-modality dynamics. We introduce novel model, termed Tensor Fusion Network, learns both such dynamics end-to-end. The proposed approach tailored for volatile nature spoken language in online videos well accompanying gestures voice. experiments, our model outperforms state-of-the-art approaches unimodal analysis.

参考文章(41)
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell, Kate Saenko, Long-term recurrent convolutional networks for visual recognition and description computer vision and pattern recognition. pp. 2625- 2634 ,(2015) , 10.1109/CVPR.2015.7298878
Thomas Drugman, Mark Thomas, Jon Gudnason, Patrick Naylor, Thierry Dutoit, Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 20, pp. 994- 1006 ,(2012) , 10.1109/TASL.2011.2170835
Ingo R. Titze, Johan Sundberg, Vocal intensity in speakers and singers Journal of the Acoustical Society of America. ,vol. 91, pp. 2936- 2946 ,(1991) , 10.1121/1.402929
D. G. Childers, C. K. Lee, Vocal quality factors: analysis, synthesis, and perception. Journal of the Acoustical Society of America. ,vol. 90, pp. 2394- 2410 ,(1991) , 10.1121/1.402044
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735
Paul Ekman, Wallace V. Freisen, Sonia Ancoli, Facial signs of emotional experience. Journal of Personality and Social Psychology. ,vol. 39, pp. 1125- 1134 ,(1980) , 10.1037/H0077722
Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, Manfred Stede, Lexicon-based methods for sentiment analysis Computational Linguistics. ,vol. 37, pp. 267- 307 ,(2011) , 10.1162/COLI_A_00049
Gilles Degottex, John Kane, Thomas Drugman, Tuomo Raitio, Stefan Scherer, COVAREP — A collaborative voice analysis repository for speech technologies international conference on acoustics, speech, and signal processing. pp. 960- 964 ,(2014) , 10.1109/ICASSP.2014.6853739
Lillian Lee, Bo Pang, Opinion Mining and Sentiment Analysis ,(2008)