The selective use of gaze in automatic speech recognition

作者： Ao Shen

DOI:

关键词: Language model 、 Eye movement 、 Feature extraction 、 Speech processing 、 Gaze 、 Psychology 、 Speech recognition 、 Natural language processing 、 Modality (human–computer interaction) 、 Speech enhancement 、 Artificial intelligence 、 Eye tracking

摘要: The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to laboratory assessments. Being a major source interference, acoustic noise affects intelligibility during the ASR process. There are two main problems caused by noise. first is signal contamination. second speakers' vocal and non-vocal behavioural changes. These phenomena elicit mismatch between training conditions, which leads considerable degradation. To improve noise-robustness, exploiting prior knowledge enhancement, feature extraction models popular approaches. An alternative approach presented this thesis introduce eye gaze as an extra modality. Eye behaviours have roles interaction contain information about cognition visual attention; not all relevant speech. Therefore, used selectively performance. This achieved inference procedures using noise-dependant their temporal semantic relationship with `Selective gaze-contingent ASR' systems proposed evaluated on corpus movement related different clean, noisy environments. best performing utilise both language model adaptation.

bl.uk 本地加速

bham.ac.uk 本地加速

暂无可下载资源，当前可以选择系统获取到有开放资源时通知我或者直接发起求助文献求助

参考文章(294)

Philip C. Woodland, Jason J. Humphries, Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition conference of the international speech communication association. ,(1997)

Shuji Doshita, Toshiyuki Sakai, The Phonetic Typewriter. ifip congress. pp. 445- 450 ,(1962)

Saeed Vaseghi, Ben P. Milner, Noise-adaptive hidden Markov models based on wiener filters. conference of the international speech communication association. ,(1993)

Eric David Petajan, Automatic lipreading to enhance speech recognition (speech reading) University of Illinois at Urbana-Champaign. ,(1984)

S. Knerr, L. Personnaz, G. Dreyfus, Single-layer learning revisited: a stepwise procedure for building and training a neural network NATO Neurocomputing. pp. 41- 50 ,(1990) , 10.1007/978-3-642-76153-9_5

Chalapathy Neti, Gerasimos Potamianos, Andrew W. Senior, Giridharan Iyengar, Benoît Maison, Perceptual interfaces for information interaction: joint processing of audio and visual information for human-computer interaction. conference of the international speech communication association. pp. 11- 14 ,(2000)

David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)

James Rossiter, Multimodal intent recognition for natural human-robotic interaction University of Birmingham. ,(2011)

Ljubomir Josifovski, Robust Automatic Speech Recognition with Missing and Unreliable Data ,(2003)

10.

Eric Vatikiotis-Bateson, Kevin G. Munhall, Takaaki Kuratate, Michel Pitermann, J. Lucero, Christian Kroos, Hani Yehia, Studies of audiovisual speech perception using production-based animation. conference of the international speech communication association. pp. 7- 10 ,(2000)

The selective use of gaze in automatic speech recognition

来源期刊

我的账户

The selective use of gaze in automatic speech recognition

来源期刊

相似文章 0

我的账户