Large-vocabulary audio-visual speech recognition by machines and humans.

作者： Chalapathy Neti , Gerasimos Potamianos , Giridharan Iyengar , Eric Helmuth

DOI:

关键词:

摘要: We compare automatic recognition with human perception of audio-visual speech, in the large-vocabulary, continuous speech (LVCSR) domain. Specifically, we study benefit visual modality for both machines and humans, when combined audio degraded by speech-babble noise at various signal-to-noise ratios (SNRs). first consider an speechreading system a pixel based front end that uses feature fusion bimodal integration, its performance audio-only LVCSR system. then describe results experiments, where subjects are asked to transcribe audiovisual utterances SNRs. For observe approximately 6 dB effective SNR gain compared 10 dB, however such gains significantly diverge other Furthermore, outperforms audioonly low

uni-trier.de 本地加速

isca-speech.org 本地加速

uni-trier.de PDF 下载加速

参考文章(1)

HARRY MCGURK, JOHN MACDONALD, Hearing lips and seeing voices Nature. ,vol. 264, pp. 746- 748 ,(1976) , 10.1038/264746A0

Large-vocabulary audio-visual speech recognition by machines and humans.

来源期刊

我的账户

Large-vocabulary audio-visual speech recognition by machines and humans.

来源期刊

相似文章 2

Robots that can hear, understand and talk

Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

我的账户