Expressive Visual Speech Generation

作者: Thomas Di Giacomo , Stephane Garchery , Nadia Magnenat-Thalmann

DOI: 10.1007/978-1-84628-907-1_2

关键词:

摘要: With the emergence of 3D graphics, we are now able to create very realistic characters that can move and talk. Multimodal interaction with such is also possible, as various technologies have matured for speech video analysis, natural language dialogues, animation. However, behavior expressed by these far from believable in most systems. We feel this problem arises due their lack individuality on levels: perception, dialogue, expression. In chapter, describe results research tries realistically connect personality characters, not only an expressive level (for example, generating individualized expressions a face), but real-time tracking, dialogue (generating responses actually correspond what certain emotional state would say) perceptive (having virtual character uses expression user data corresponding behavior). The idea linking agent has been discussed Marsella et al. [33], influence emotion general, Johns [21] how affect decision making. Traditionally, any text or voice-driven animation system phonemes basic units speech, visemes Though text-to-speech synthesizers phoneme recognizers often use biphonebased techniques, end seldom access information, except dedicated Most commercially freely available software applications allow time-stamped streams along audio. Thus, order generate extra processing, namely co-articulation, required. This process takes care neighboring fluent production. processing stage be eliminated using syllable unit rather than phoneme. Overall, do intend give complete survey ongoing behavior, emotion, personality. Our main goal conversational agents interact many modalities. thus concentrate extraction real (Section 2.3), visyllable-based 2.4), systems emotions 2.5).

参考文章(38)
John S. D. Mason, Simon Downey, Rhys James Jones, Continuous speech recognition using syllables. conference of the international speech communication association. ,(1997)
Bernd Möbius, George Anton Kiraz, Multilingual syllabification using weighted finite-state transducers. SSW. pp. 71- 76 ,(1998)
Taro Goto, Marc Escher, Christian Zanardi, Nadia Magnenat-Thalmann, MPEG-4 based animation with face feature tracking Computer Animation and Simulation. pp. 89- 98 ,(1999) , 10.1007/978-3-7091-6423-5_9
Elisabeth André, Martin Klesen, Patrick Gebhard, Steve Allen, Thomas Rist, Integrating models of personality and emotions into lifelike characters international workshop on affective interactions. pp. 150- 165 ,(2001) , 10.1007/10720296_11
N Magnenat Thalmann, D Thalmann, T Capin, I Pandzic, Towards Natural Communication in Networked Collaborative Virtual Environments Proc.FIVE 1996. ,(1996)
Jack Breese, J. Eugene Ball, Emotion and Personality in a Conversational Character ,(1998)
Juan D. Velásquez, Modeling emotions and other motivations in synthetic agents national conference on artificial intelligence. pp. 10- 15 ,(1997)
Michael M. Cohen, Dominic W. Massaro, Modeling Coarticulation in Synthetic Visual Speech Models and Techniques in Computer Animation. pp. 139- 156 ,(1993) , 10.1007/978-4-431-66911-1_13
Stéphane Garchery, Ronan Boulic, Tolga Capin, Prem Kalra, Standards for Virtual Humans John Wiley & Sons, Ltd. pp. 373- 391 ,(2006) , 10.1002/0470023198.CH16