作者: Ao Shen
DOI:
关键词: Language model 、 Eye movement 、 Feature extraction 、 Speech processing 、 Gaze 、 Psychology 、 Speech recognition 、 Natural language processing 、 Modality (human–computer interaction) 、 Speech enhancement 、 Artificial intelligence 、 Eye tracking
摘要: The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to laboratory assessments. Being a major source interference, acoustic noise affects intelligibility during the ASR process. There are two main problems caused by noise. first is signal contamination. second speakers' vocal and non-vocal behavioural changes. These phenomena elicit mismatch between training conditions, which leads considerable degradation. To improve noise-robustness, exploiting prior knowledge enhancement, feature extraction models popular approaches. An alternative approach presented this thesis introduce eye gaze as an extra modality. Eye behaviours have roles interaction contain information about cognition visual attention; not all relevant speech. Therefore, used selectively performance. This achieved inference procedures using noise-dependant their temporal semantic relationship with `Selective gaze-contingent ASR' systems proposed evaluated on corpus movement related different clean, noisy environments. best performing utilise both language model adaptation.