作者: C. Bregler , S.M. Omohundro , Yochai Konig
DOI: 10.1109/ACSSC.1994.471514
关键词:
摘要: We explore multimodal recognition by combining visual lipreading with acoustic speech recognition. show that and information improves the performance significantly, especially in noisy environments. This is achieved a hybrid architecture, consisting of new learning tracking mechanism, channel robust front end, connectionist phone classifier, HMM based sentence classifier. Our focus this paper on subsystem "surface-learning" active vision models. bimodal system has already been applied to multi-speaker spelling task, work progress apply it speaker independent spontaneous "Berkeley Restaurant Project (BeRP)". >