作者: Udhyakumar Nallasamy , Tanja Schultz , Florian Metze
DOI:
关键词:
摘要: Accented speech that is under-represented in the training data still suffers high Word Error Rate (WER) with state-of-the-art Automatic Speech Recognition (ASR) systems. Careful collection and transcription of for different accents can address this issue, but it both time consuming expensive. However, many tasks such as broadcast news or voice search, easy to obtain large amounts audio from target users representative accents, albeit without accent labels even transcriptions. Semi-supervised have been explored ASR past leverage data, these techniques assume homogeneous test conditions. In paper, we experiment cross-entropy based speaker selection adapt a source recognizer semi-supervised manner, using additional no labels. We compare our technique self-training only on confidence scores show significant improvements over baseline by leveraging unlabeled two Arabic English.