A Talking Face Driven by Voice using Hidden Markov Model

作者： Wen-Kai Tai , Mau-Tsuen Yang , Guang-Yi Wang , Cheng-Chin Chiang

DOI: 10.6688/JISE.2006.22.5.5

关键词: Artificial intelligence 、 Computer facial animation 、 Computer vision 、 Virtual reality 、 Face (geometry) 、 Mel-frequency cepstrum 、 Audio signal 、 Synchronization 、 Cepstrum 、 Computer science 、 Speech recognition 、 Hidden Markov model

摘要: In this paper, we utilized Hidden Markov Model (HMM) as a mapping mechanism between two different kinds of correlated signals. Specifically, developed voice-driven talking head system by exploiting the physical relationships shape mouth and sound that is produced. The proposed can be easily trained efficiently animated. training phase, Mel-scale Frequency Cepstral Coefficients (MFCC) were analyzed from audio signals Facial Animation Parameters (FAP) extracted video Then both features integrated to train single HMM. synthesis HMM was used correlate completely novel track FAP sequence for face with help Engine (FAE). experiments demonstrated effects on man woman, styles (speaking singing) using three languages (Chinese, English Taiwanese). possible applications are computer aided instruction, online guide, virtual conference, lip synchronization, human interaction so on.

uni-trier.de 本地加速

ndhu.edu.tw 本地加速

sci-hub.st HTML 下载加速

参考文章(15)

Alex Acero, Xuedong Huang, Hsiao-Wuen Hon, Spoken Language Processing Prentice-Hall. pp. 1008- ,(2001)

Keith Waters, Thomas M Levergood, DEC face: an automatic lip-synchronization algorithm for synthetic faces Technica Report CRL 93/4. ,(1993)

Yiqiang Chen, Wen Gao, Zhaoqi Wang, Li Zuo, Speech Driven MPEG-4 Based Face Animation via Neural Network pacific rim conference on multimedia. pp. 1108- 1113 ,(2001) , 10.1007/3-540-45453-5_152

E. Yamamoto, S. Nakamura, K. Shikano, Lip movement synthesis from speech based on hidden Markov models Speech Communication. ,vol. 26, pp. 105- 115 ,(1998) , 10.1016/S0167-6393(98)00054-5

Tony Ezzat, Gadi Geiger, Tomaso Poggio, Trainable videorealistic speech animation international conference on computer graphics and interactive techniques. ,vol. 21, pp. 388- 398 ,(2002) , 10.1145/566570.566594

P.S. Aleksic, A.K. Katsaggelos, Speech-to-video synthesis using MPEG-4 compliant visual features IEEE Transactions on Circuits and Systems for Video Technology. ,vol. 14, pp. 682- 692 ,(2004) , 10.1109/TCSVT.2004.826760

L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition Proceedings of the IEEE. ,vol. 77, pp. 267- 296 ,(1989) , 10.1109/5.18626

Yong-Yuan Lin, Ya-Chun Shih, Mau-Tsuen Yang, VEC3D: a 3-D virtual English classroom for second language learning international conference on advanced learning technologies. pp. 906- 908 ,(2005) , 10.1109/ICALT.2005.302

J.J. Williams, A.K. Katsaggelos, An HMM-based speech-to-video synthesizer IEEE Transactions on Neural Networks. ,vol. 13, pp. 900- 915 ,(2002) , 10.1109/TNN.2002.1021891

10.

S. Morishima, H. Harashima, Speech-to-image media conversion based on VQ and neural network international conference on acoustics, speech, and signal processing. pp. 2865- 2868 ,(1991) , 10.1109/ICASSP.1991.151000

A Talking Face Driven by Voice using Hidden Markov Model

来源期刊

我的账户

A Talking Face Driven by Voice using Hidden Markov Model

来源期刊

相似文章 5

Video-audio driven real-time facial animation

Lip contour extraction for language learning in VEC3D

A new language independent, photo-realistic talking head driven by voice only.

Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar

High-fidelity facial and speech animation for VR HMDs

我的账户