作者: Ara V. Nefian , Luhong Liang , Xiaobo Pi , Xiaoxing Liu , Kevin Murphy
DOI: 10.1155/S1110865702206083
关键词: Computer science 、 Invariant (mathematics) 、 Noise 、 Statistical model 、 Pattern recognition 、 Hidden Markov model 、 Dynamic Bayesian network 、 Artificial intelligence 、 Speech recognition 、 Word recognition 、 Audio-visual speech recognition
摘要: The use of visual features in audio-visual speech recognition (AVSR) is justified by both the generation mechanism, which essentially bimodal audio and representation, need for that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements environments affected noise. In this paper, we describe two statistical models integration, coupled HMM (CHMM) factorial (FHMM), compare performance these with existing used speaker dependent isolated word recognition. properties CHMM FHMM allow model state asynchrony observation sequences while preserving their natural correlation over time. our experiments, performs best overall, outperforming all FHMM.