Deep learning for robust feature generation in audiovisual emotion recognition

作者: Yelin Kim , Honglak Lee , Emily Mower Provost

DOI: 10.1109/ICASSP.2013.6638346

关键词: SIGNAL (programming language)Emotion classificationSpeech recognitionFocus (optics)Feature selectionArtificial intelligenceFeature (machine learning)Deep belief networkComputer scienceMachine learningDeep learning

摘要: … features for audio-visual emotion recognition. Emotion recognition accuracy relies heavily on the ability to generate representative features. However, this is a very challenging problem. …

参考文章(37)
Włodzisław Duch, Jacek Biesiada, Tomasz Winiarski, Karol Grudziński, Krzysztof Grąbczewski, Feature Selection Based on Information Theory Filters Physica, Heidelberg. pp. 173- 178 ,(2003) , 10.1007/978-3-7908-1902-1_23
Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan, Using Neutral Speech Models for Emotional Speech Analysis conference of the international speech communication association. pp. 2225- 2228 ,(2007)
Tim Polzehl, Hamed Ketabdar, Michael Wagner, Florian Metze, Shiva Sundaram, Emotion Classification in Children's speech using fusion of acoustic and linguistic features conference of the international speech communication association. pp. 340- 343 ,(2009)
Alessandro Vinciarelli, Elmar Nöth, Rob van Son, Björn W. Schuller, Stefan Steidl, Felix Burkhardt, Benjamin Weiss, Tobias Bocklet, Florian Eyben, Felix Weninger, Gelareh Mohammadi, Anton Batliner, The INTERSPEECH 2012 Speaker Trait Challenge conference of the international speech communication association. pp. 254- 257 ,(2012)
Björn W. Schuller, Stefan Steidl, Anton Batliner, The INTERSPEECH 2009 Emotion Challenge conference of the international speech communication association. pp. 312- 315 ,(2009)
P. Smolensky, Information processing in dynamical systems: foundations of harmony theory Parallel distributed processing: explorations in the microstructure of cognition, vol. 1. pp. 194- 281 ,(1986)
Gerhard Rigoll, Bernd Radig, Dejan Arsic, Björn W. Schuller, Matthias Wimmer, Low-Level Fusion of Audio and Video Feature for Multi-Modal Emotion Recognition international conference on computer vision theory and applications. pp. 145- 151 ,(2008)
Chris Eliasmith, Yichuan Tang, Deep networks for robust visual recognition international conference on machine learning. pp. 1055- 1062 ,(2010)
Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng, Unsupervised learning of hierarchical representations with convolutional deep belief networks Communications of the ACM. ,vol. 54, pp. 95- 103 ,(2011) , 10.1145/2001269.2001295
Abdel-rahman Mohamed, George E. Dahl, Geoffrey Hinton, Acoustic Modeling Using Deep Belief Networks IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 20, pp. 14- 22 ,(2012) , 10.1109/TASL.2011.2109382