Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network

作者: George Trigeorgis , Fabien Ringeval , Raymond Brueckner , Erik Marchi , Mihalis A. Nicolaou

DOI: 10.1109/ICASSP.2016.7472669

关键词: Feature (machine learning)Speech recognitionSIGNAL (programming language)Artificial intelligenceContext (language use)Deep learningConvolutional neural networkRepresentation (mathematics)Task (project management)Computer scienceSignal processing

摘要: … emotion prediction compared to optimising the mean square error objective, which is traditionally used. Finally, by further studying the activations of different cells in the recurrent layers, …

参考文章(30)
Hans-Wilhelm Rühl, Hans-Günter Hirsch, Peter Meyer, Improved speech recognition using high-pass filtering of subband envelopes. conference of the international speech communication association. ,(1991)
Tara N. Sainath, Oriol Vinyals, Andrew Senior, Hasim Sak, Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks international conference on acoustics, speech, and signal processing. pp. 4580- 4584 ,(2015) , 10.1109/ICASSP.2015.7178838
Dimitri Palaz, Ronan Collobert, Mathew Magimai.-Doss, ANALYSIS OF CNN-BASED SPEECH RECOGNITION SYSTEM USING RAW SPEECH AS INPUT conference of the international speech communication association. pp. 11- 15 ,(2015)
Soroosh Mariooryad, Carlos Busso, Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators IEEE Transactions on Affective Computing. ,vol. 6, pp. 97- 108 ,(2015) , 10.1109/TAFFC.2014.2334294
K Scherer, Vocal communication of emotion: A review of research paradigms Speech Communication. ,vol. 40, pp. 227- 256 ,(2003) , 10.1016/S0167-6393(02)00084-5
Fabien Ringeval, Florian Eyben, Eleni Kroupi, Anil Yuce, Jean-Philippe Thiran, Touradj Ebrahimi, Denis Lalanne, Björn Schuller, Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data Pattern Recognition Letters. ,vol. 66, pp. 22- 30 ,(2015) , 10.1016/J.PATREC.2014.11.007
R. Schluter, I. Bezrukov, H. Wagner, H. Ney, Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07. ,vol. 4, pp. 649- 652 ,(2007) , 10.1109/ICASSP.2007.366996
Fabien Ringeval, Björn Schuller, Michel Valstar, Shashank Jaiswal, Erik Marchi, Denis Lalanne, Roddy Cowie, Maja Pantic, AV+EC 2015 Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. pp. 3- 8 ,(2015) , 10.1145/2808196.2811642
Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, Denis Lalanne, Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions ieee international conference on automatic face gesture recognition. pp. 1- 8 ,(2013) , 10.1109/FG.2013.6553805
Sander Dieleman, Benjamin Schrauwen, End-to-end learning for music audio international conference on acoustics, speech, and signal processing. pp. 6964- 6968 ,(2014) , 10.1109/ICASSP.2014.6854950