Characteristics of the use of coupled hidden Markov models for audio-visual polish speech recognition

作者: M. Kubanek , J. Bobulski , L. Adrjanowicz

DOI: 10.2478/V10175-012-0041-6

关键词:

摘要: This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of the highly disturbed audio signal. Recognition was based combined hidden Markov models (CHMM). The described methods were developed a single isolated command, nevertheless their effectiveness indicated that they would also work similarly continuous audiovisual recognition. problem visual analysis is very difficult and computationally demanding, mostly because an extreme amount data needs to be processed. Therefore, method audio-video used only while audiospeech signal exposed considerable level distortion. There are proposed authors’ own lip edges detection characteristic extraction this paper. Moreover, fusing characteristics tested. A significant increase processing speed noted during tests – properly selected CHMM parameters adequate codebook size, besides use appropriate fusion characteristics. experimental results promising close those achieved by leading scientists field

参考文章(16)
Ara V. Nefian, Luhong Liang, Yibao Zhao, Xiaoxing Liu, Xiaobo Pi, Audio-visual continuous speech recognition using a coupled hidden Markov model. conference of the international speech communication association. ,(2002)
Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition ,(1993)
Sadaoki Furui, Koji Iwano, Satoshi Tamura, Tomoaki Yoshinaga, Audio-visual speech recognition using lip movement extracted from side-face images. AVSP. pp. 117- 120 ,(2003)
M.N. Kaynak, Qi Zhi, A.D. Cheok, K. Sengupta, Ko Chi Chung, Audio-visual modeling for bimodal speech recognition systems man and cybernetics. ,vol. 1, pp. 181- 186 ,(2001) , 10.1109/ICSMC.2001.969809
S. Szczepański, M. Wöjcikowski, B. Pankiewicz, M. KŁosowski, R. Żaglewski, FPGA and ASIC implementation of the algorithm for traffic monitoring in urban areas Bulletin of The Polish Academy of Sciences-technical Sciences. ,vol. 59, pp. 137- 140 ,(2011) , 10.2478/V10175-011-0017-Y
G. Demenko, B. Möbius, K. Klessa, Implementation of Polish speech synthesis for the BOSS system Bulletin of The Polish Academy of Sciences-technical Sciences. ,vol. 58, pp. 371- 376 ,(2010) , 10.2478/V10175-010-0035-1
Jongju Shin, Jin Lee, Daijin Kim, Real-time lip reading system for isolated Korean word recognition Pattern Recognition. ,vol. 44, pp. 559- 571 ,(2011) , 10.1016/J.PATCOG.2010.09.011
Wei Ji Ma, Xiang Zhou, Lars A. Ross, John J. Foxe, Lucas C. Parra, Lip-Reading Aids Word Recognition Most in Moderate Noise: A Bayesian Explanation Using High-Dimensional Feature Space PLoS ONE. ,vol. 4, pp. e4638- ,(2009) , 10.1371/JOURNAL.PONE.0004638
Michał Choraś, Human Lips as Emerging Biometrics Modality international conference on image analysis and recognition. pp. 993- 1002 ,(2008) , 10.1007/978-3-540-69812-8_99