作者: Francoise Beaufays , Vincent Vanhoucke , Brian Strope
DOI:
关键词:
摘要: One of the difficult problems acoustic modeling for Automatic Speech Recognition (ASR) is how to adequately model wide variety conditions which may be present in data. The problem especially acute tasks such as Google Search by Voice, where amount speech available per transaction small, and adaptation techniques start showing their limitations. As training data from a very large user population however, it possible identify jointly subsets with similar qualities. We describe technique allows us perform this at scale on amounts learning treestructured partition space, we demonstrate that can significantly improve recognition accuracy various through unsupervised Maximum Mutual Information (MMI) training. Being fully unsupervised, scales easily increasing numbers conditions.