作者: David Martin Powers , Trent Wilson Lewis
DOI:
关键词:
摘要: Audio-Visual Automatic Speech Recognition offers to make speech recognition possible in noisy environments. Early and late fusion approaches dominate the field but may ignore linguistically relevant features. Distinctive features offer an alternative unit for research has shown that this is feasible on subsets of phonemes [1]. This paper outlines two extended models, multi-class binary, results suggest it achieve a 20dB gain over audio-only low SNR