Semi-supervised learning for speech recognition in the context of accent adaptation.

作者: Udhyakumar Nallasamy , Tanja Schultz , Florian Metze

DOI:

关键词:

摘要: Accented speech that is under-represented in the training data still suffers high Word Error Rate (WER) with state-of-the-art Automatic Speech Recognition (ASR) systems. Careful collection and transcription of for different accents can address this issue, but it both time consuming expensive. However, many tasks such as broadcast news or voice search, easy to obtain large amounts audio from target users representative accents, albeit without accent labels even transcriptions. Semi-supervised have been explored ASR past leverage data, these techniques assume homogeneous test conditions. In paper, we experiment cross-entropy based speaker selection adapt a source recognizer semi-supervised manner, using additional no labels. We compare our technique self-training only on confidence scores show significant improvements over baseline by leveraging unlabeled two Arabic English.

参考文章(22)
Bhuvana Ramabhadran, Exploiting large quantities of spontaneous speech for unsupervised training of acoustic models. conference of the international speech communication association. pp. 1617- 1620 ,(2005)
Richard M. Schwartz, Jeff Z. Ma, Unsupervised versus supervised training of acoustic models. conference of the international speech communication association. pp. 2374- 2377 ,(2008)
Avishek Saha, Piyush Rai, Hal Daumé, Suresh Venkatasubramanian, Scott L. DuVall, Active Supervised Domain Adaptation Machine Learning and Knowledge Discovery in Databases. pp. 97- 112 ,(2011) , 10.1007/978-3-642-23808-6_7
Alex Waibel, Thomas Kemp, Unsupervised training of a speech recognizer: recent experiments. conference of the international speech communication association. ,(1999)
Udhyakumar Nallasamy, Qin Jin, Tanja Schultz, Roger Hsiao, Florian Metze, The 2010 CMU GALE Speech-to-Text System conference of the international speech communication association. pp. 1501- 1504 ,(2010)
Udhyakumar Nallasamy, Tanja Schultz, Florian Metze, Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition conference of the international speech communication association. pp. 1902- 1905 ,(2012)
Corinna Cortes, Mehryar Mohri, Michael Riley, Afshin Rostamizadeh, Sample Selection Bias Correction Theory Lecture Notes in Computer Science. pp. 38- 53 ,(2008) , 10.1007/978-3-540-87987-9_8
Xiaodong Cui, Jing Huang, Jen-Tzung Chien, Multi-View and Multi-Objective Semi-Supervised Learning for HMM-Based Automatic Speech Recognition IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 20, pp. 1923- 1935 ,(2012) , 10.1109/TASL.2012.2191955
Daniel Povey, Kaisheng Yao, A basis representation of constrained MLLR transforms for robust adaptation Computer Speech & Language. ,vol. 26, pp. 35- 51 ,(2012) , 10.1016/J.CSL.2011.04.002
F. Wessel, H. Ney, Unsupervised training of acoustic models for large vocabulary continuous speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 13, pp. 23- 31 ,(2005) , 10.1109/TSA.2004.838537