A two-channel training algorithm for hidden Markov model and its application to lip reading

作者: Liang Dong , Say Wei Foo , Yong Lian

DOI: 10.1155/ASP.2005.1382

关键词:

摘要: Hidden Markov model (HMM) has been a popular mathematical approach for sequence classification such as speech recognition since 1980s. In this paper, novel two-channel training strategy is proposed discriminative of HMM. For the strategy, separable-distance function that measures difference between pair samples adopted criterion function. The symbol emission matrix an HMM split into two channels: static channel to maintain validity and dynamic modified maximize separable distance. parameters are estimated by iterative application expectation-maximization (EM) operations. As example approach, hierarchical speaker-dependent visual system trained using HMMs. Results experiments on identifying group confusable visemes indicate able increase accuracy average 20% compared with conventional HMMs Baum-Welch estimation.

参考文章(48)
Eric David Petajan, Automatic lipreading to enhance speech recognition (speech reading) University of Illinois at Urbana-Champaign. ,(1984)
Chalapathy Neti, Guillaume Gravier, Gerasimos Potamianos, Asynchrony modeling for audio-visual speech recognition international conference on human language technology research. pp. 1- 6 ,(2002)
Peter L. Silsbee, Alan C. Bovik, Medium Vocabulary Audiovisual Speech Recognition Springer Berlin Heidelberg. pp. 120- 123 ,(1995) , 10.1007/978-3-642-57745-1_21
D.G. Stork, M.E. Hennecke, Speechreading: an overview of image processing, feature extraction, sensory integration and pattern recognition techniques international conference on automatic face and gesture recognition. ,(1996) , 10.1109/AFGR.1996.557235
A. Adjoudani, C. Benoît, On the Integration of Auditory and Visual Parameters in an HMM-based ASR Springer, Berlin, Heidelberg. pp. 461- 471 ,(1996) , 10.1007/978-3-662-13015-5_35
Say Wei Foo, Liang Dong, Recognition of Visual Speech Elements Using Hidden Markov Models pacific rim conference on multimedia. pp. 607- 614 ,(2002) , 10.1007/3-540-36228-2_75
Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition ,(1993)