Audio-visual speech translation with automatic lip syncqronization and face tracking based on 3-D head model

作者: Shigeo Morishima , Shin Ogata , Kazumasa Murai , Satoshi Nakamura

DOI: 10.1109/ICASSP.2002.5745053

关键词:

摘要: Speech-to-speech translation has been studied to realize natural human communication beyond language barriers. Toward further multi-modal communication, visual information such as face and lip movements will be necessary. In this paper, we introduce a English-to-Japanese Japanese-to-English system that also translates the speaker's speech motion while synchronizing it translated speech. To retain facial expression, substitute only organ's image with synthesized one, which is made by three-dimensional wire-frame model adaptable any speaker. Our approach enables synthesis an extremely small database. We conduct subjective evaluation connected digit discrimination using data without audiovisual lip-synchronicity. The results confirm sufficient quality of proposed audio-visual system.

参考文章(4)
Yoshinori Sagisaka, Fumiaki Sugaya, Hitoshi Iida, Nick Campbell, Seiichi Yamamoto, Akio Yokoo, Tsuyoshi Morimoto, Toshiyuki Takezawa, A Japanese-to-English speech translation system: ATR-MATRIX. conference of the international speech communication association. ,(1998)
Shigeo Morishima, Satoshi Nakamura, Takafumi Misawa, Kazumasa Murai, Automatic Face Tracking and Model Match-Move in Video Sequence using 3D Face Model international conference on multimedia and expo. ,vol. 2001, pp. 293- ,(2001) , 10.1109/ICME.2001.10015
H.P. Graf, E. Cosatto, T. Ezzat, Face analysis for the synthesis of photo-realistic talking heads ieee international conference on automatic face and gesture recognition. pp. 189- 194 ,(2000) , 10.1109/AFGR.2000.840633
Shigeo Morishima, Takafumi Misawa, Kei Ito, Junichi Muto, 3D Lip Expression Generation by using New Lip Parameters Proceedings of the IEICE General Conference. ,vol. 2000, pp. 328- ,(2000)