A Talking Face Driven by Voice using Hidden Markov Model

作者: Wen-Kai Tai , Mau-Tsuen Yang , Guang-Yi Wang , Cheng-Chin Chiang

DOI: 10.6688/JISE.2006.22.5.5

关键词: Artificial intelligenceComputer facial animationComputer visionVirtual realityFace (geometry)Mel-frequency cepstrumAudio signalSynchronizationCepstrumComputer scienceSpeech recognitionHidden Markov model

摘要: In this paper, we utilized Hidden Markov Model (HMM) as a mapping mechanism between two different kinds of correlated signals. Specifically, developed voice-driven talking head system by exploiting the physical relationships shape mouth and sound that is produced. The proposed can be easily trained efficiently animated. training phase, Mel-scale Frequency Cepstral Coefficients (MFCC) were analyzed from audio signals Facial Animation Parameters (FAP) extracted video Then both features integrated to train single HMM. synthesis HMM was used correlate completely novel track FAP sequence for face with help Engine (FAE). experiments demonstrated effects on man woman, styles (speaking singing) using three languages (Chinese, English Taiwanese). possible applications are computer aided instruction, online guide, virtual conference, lip synchronization, human interaction so on.

参考文章(15)
Alex Acero, Xuedong Huang, Hsiao-Wuen Hon, Spoken Language Processing Prentice-Hall. pp. 1008- ,(2001)
Keith Waters, Thomas M Levergood, DEC face: an automatic lip-synchronization algorithm for synthetic faces Technica Report CRL 93/4. ,(1993)
Yiqiang Chen, Wen Gao, Zhaoqi Wang, Li Zuo, Speech Driven MPEG-4 Based Face Animation via Neural Network pacific rim conference on multimedia. pp. 1108- 1113 ,(2001) , 10.1007/3-540-45453-5_152
E. Yamamoto, S. Nakamura, K. Shikano, Lip movement synthesis from speech based on hidden Markov models Speech Communication. ,vol. 26, pp. 105- 115 ,(1998) , 10.1016/S0167-6393(98)00054-5
Tony Ezzat, Gadi Geiger, Tomaso Poggio, Trainable videorealistic speech animation international conference on computer graphics and interactive techniques. ,vol. 21, pp. 388- 398 ,(2002) , 10.1145/566570.566594
P.S. Aleksic, A.K. Katsaggelos, Speech-to-video synthesis using MPEG-4 compliant visual features IEEE Transactions on Circuits and Systems for Video Technology. ,vol. 14, pp. 682- 692 ,(2004) , 10.1109/TCSVT.2004.826760
L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition Proceedings of the IEEE. ,vol. 77, pp. 267- 296 ,(1989) , 10.1109/5.18626
Yong-Yuan Lin, Ya-Chun Shih, Mau-Tsuen Yang, VEC3D: a 3-D virtual English classroom for second language learning international conference on advanced learning technologies. pp. 906- 908 ,(2005) , 10.1109/ICALT.2005.302
J.J. Williams, A.K. Katsaggelos, An HMM-based speech-to-video synthesizer IEEE Transactions on Neural Networks. ,vol. 13, pp. 900- 915 ,(2002) , 10.1109/TNN.2002.1021891
S. Morishima, H. Harashima, Speech-to-image media conversion based on VQ and neural network international conference on acoustics, speech, and signal processing. pp. 2865- 2868 ,(1991) , 10.1109/ICASSP.1991.151000