Unsupervised Discovery and Training of Maximally Dissimilar Cluster Models

作者: Francoise Beaufays , Vincent Vanhoucke , Brian Strope

DOI:

关键词:

摘要: One of the difficult problems acoustic modeling for Automatic Speech Recognition (ASR) is how to adequately model wide variety conditions which may be present in data. The problem especially acute tasks such as Google Search by Voice, where amount speech available per transaction small, and adaptation techniques start showing their limitations. As training data from a very large user population however, it possible identify jointly subsets with similar qualities. We describe technique allows us perform this at scale on amounts learning treestructured partition space, we demonstrate that can significantly improve recognition accuracy various through unsupervised Maximum Mutual Information (MMI) training. Being fully unsupervised, scales easily increasing numbers conditions.

参考文章(14)
Martine Adda-Decker, Lori Lamel, Do speech recognizers prefer female speakers conference of the international speech communication association. pp. 2205- 2208 ,(2005)
G. Cook, T. Robinson, Boosting the performance of connectionist large vocabulary speech recognition international conference on spoken language processing. ,vol. 3, pp. 1305- 1308 ,(1996) , 10.1109/ICSLP.1996.607852
T. Anastasakos, J. McDonough, R. Schwartz, J. Makhoul, A compact model for speaker-adaptive training international conference on spoken language processing. ,vol. 2, pp. 1137- 1140 ,(1996) , 10.1109/ICSLP.1996.607807
Roland Kuhn, Eigenvoices for speaker adaptation conference of the international speech communication association. ,vol. 5, pp. 1771- 1774 ,(1998)
Hy Murveit, Mitch Weintraub, Mike Cohen, Training set issues in SRI's DECIPHER speech recognition system human language technology. pp. 337- 340 ,(1990) , 10.3115/116580.116717
Ananth Sankar, Ashvin Kannan, A comprehensive study of task-specific adaptation of speech recognition models Speech Communication. ,vol. 42, pp. 125- 139 ,(2004) , 10.1016/J.SPECOM.2003.09.001
M.J.F. Gales, Cluster adaptive training of hidden Markov models IEEE Transactions on Speech and Audio Processing. ,vol. 8, pp. 417- 428 ,(2000) , 10.1109/89.848223
Sam T. Roweis, EM Algorithms for PCA and SPCA neural information processing systems. ,vol. 10, pp. 626- 632 ,(1997)
M.J.F. Gales, Semi-tied covariance matrices for hidden Markov models IEEE Transactions on Speech and Audio Processing. ,vol. 7, pp. 272- 281 ,(1999) , 10.1109/89.759034
V.V. Digalakis, D. Rtischev, L.G. Neumeyer, Speaker adaptation using constrained estimation of Gaussian mixtures IEEE Transactions on Speech and Audio Processing. ,vol. 3, pp. 357- 366 ,(1995) , 10.1109/89.466659