作者: A. Metallinou , M. Wollmer , A. Katsamanis , F. Eyben , B. Schuller
关键词:
摘要: Human emotional expression tends to evolve in a structured manner the sense that certain evolution patterns, i.e., anger anger, are more probable than others, e.g., happiness. Furthermore, perception of an display can be affected by recent displays. Therefore, content past and future observations could offer relevant temporal context when classifying observation. In this work, we focus on audio-visual recognition improvised interactions at utterance level. We examine context-sensitive schemes for emotion within multimodal, hierarchical approach: bidirectional Long Short-Term Memory (BLSTM) neural networks, Hidden Markov Model classifiers (HMMs), hybrid HMM/BLSTM considered modeling between utterances over course dialog. Overall, our experimental results indicate incorporating long-term is beneficial systems encounter variety manifestations. Context-sensitive approaches outperform those without classification tasks such as discrimination valence levels or clusters valence-activation space. The analysis transitions database sheds light into flow affective expressions, revealing potentially useful patterns.