Large-vocabulary audio-visual speech recognition by machines and humans.

作者: Chalapathy Neti , Gerasimos Potamianos , Giridharan Iyengar , Eric Helmuth

DOI:

关键词:

摘要: We compare automatic recognition with human perception of audio-visual speech, in the large-vocabulary, continuous speech (LVCSR) domain. Specifically, we study benefit visual modality for both machines and humans, when combined audio degraded by speech-babble noise at various signal-to-noise ratios (SNRs). first consider an speechreading system a pixel based front end that uses feature fusion bimodal integration, its performance audio-only LVCSR system. then describe results experiments, where subjects are asked to transcribe audiovisual utterances SNRs. For observe approximately 6 dB effective SNR gain compared 10 dB, however such gains significantly diverge other Furthermore, outperforms audioonly low

参考文章(1)
HARRY MCGURK, JOHN MACDONALD, Hearing lips and seeing voices Nature. ,vol. 264, pp. 746- 748 ,(1976) , 10.1038/264746A0