Audio-visual continuous speech recognition using a coupled hidden Markov model.

作者: Ara V. Nefian , Luhong Liang , Yibao Zhao , Xiaoxing Liu , Xiaobo Pi

DOI:

关键词:

摘要: With the increase in computational complexity of recent computers, audio-visual speech recognition (AVSR) became an attractive research topic that can lead to a robust solution for noisy environments. In audio visual continuous system presented this paper, and observation sequences are integrated using coupled hidden Markov model (CHMM). The statistical properties CHMM describe asyncrony features while preserving their natural correlation over time. experimental results show current tested on XM2VTS database reduces error rate only at SNR 0db by 55%.

参考文章(10)
Steve Young, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw, Xunying Liu, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev, Phil Woodland, The HTK book Cambridge University Engineering Department and Entrophic Cambridge Research Laboratory. ,(1995)
Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition ,(1993)
J. Luettin, G. Potamianos, C. Neti, Asynchronous stream modeling for large vocabulary audio-visual speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 169- 172 ,(2001) , 10.1109/ICASSP.2001.940794
Luhong Liang, Xiaoxing Liu, Yibao Zhao, Xiaobo Pi, A.V. Nefian, Speaker independent audio-visual continuous speech recognition international conference on multimedia and expo. ,vol. 2, pp. 25- 28 ,(2002) , 10.1109/ICME.2002.1035365
S. Dupont, J. Luettin, Audio-visual speech modeling for continuous speech recognition IEEE Transactions on Multimedia. ,vol. 2, pp. 141- 151 ,(2000) , 10.1109/6046.865479
Ara V. Nefian, Luhong Liang, Xiaobo Pi, Liu Xiaoxiang, Crusoe Mao, Kevin Murphy, A coupled HMM for audio-visual speech recognition IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 2, pp. 2013- 2016 ,(2002) , 10.1109/ICASSP.2002.5745027
M. Oerder, H. Ney, Word graphs: an efficient interface between continuous-speech recognition and language understanding IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 2, pp. 119- 122 ,(1993) , 10.1109/ICASSP.1993.319246
David G. Stork, Richard O. Duda, Peter E. Hart, Pattern Classification ,(1973)
C. Neti, G. Potamianos, I. Matthews, Hervé Glotin, D. Vergyri, Juergen Luettin, J. Sison, A. Mashari, Audio-visual speech recognition Johns Hopkins University-CLSP. ,(2000)