Modelling out-of-vocabulary words for robust speech recognition

作者: James Glass , Issam Bazzi

DOI:

关键词:

摘要: This thesis concerns the problem of unknown or out-of-vocabulary (OOV) words in continuous speech recognition. We propose a novel approach for handling OOV within single-stage recognition framework. To achieve this goal, an explicit and detailed model is constructed then used to augment closed-vocabulary search space standard recognizer. achieves open-vocabulary through use more flexible subword units that can be concatenated during form new phone sequences corresponding potential words. Examples such are phones, syllables, some automatically-learned multi-phone sequences. Subword have attractive property being closed set, thus able cover any words, conceivably most utterances with partially spoken as well. The main challenge ensuring does not absorb portions signal in-vocabulary (IV) In dealing challenge, we explore several research issues related designing lexicon, language model, topology model. We present dictionary-based estimating models. Such models utilized help recognize underlying phonetic transcription also data-driven iterative bottom-up procedure automatically creating inventory. Starting individual uses maximum mutual information principle successively merge phones obtain longer units. extends modelling multiple classes In addition, examines combining confidence scoring. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

参考文章(90)
James R. Glass, Issam Bazzi, Learning units for domain-independent out-of- vocabulary word modelling. conference of the international speech communication association. pp. 61- 64 ,(2001)
Andrew K. Halberstadt, James R. Glass, Heterogeneous acoustic measurements and multiple classifiers for speech recognition Massachusetts Institute of Technology. ,(1999)
Petra Geutner, Fuzzy class rescoring: a part-of-speech language model. conference of the international speech communication association. ,(1997)
Man-Hung Siu, Fred Richardson, Herbert Gish, Improved estimation, evaluation and applications of confidence measures for speech recognition. conference of the international speech communication association. ,(1997)
Philip C. Woodland, M. Jones, Modelling syllable characteristics to improve a large vocabulary continuous speech recogniser conference of the international speech communication association. ,(1994)
Li Deng, Hossein Sameti, Automatic speech recognition using dynamically defined speech units. conference of the international speech communication association. ,(1994)
Monika Woszczyna, Alex Waibel, Bernhard Suhm, Detection and transcription of new words. conference of the international speech communication association. ,(1993)
James R. Glass, Michael K. McCandless, Empirical acquisition of language models for speech recognition. conference of the international speech communication association. ,(1994)
Michael Riley, Mehryar Mohri, Weighted determinization and minimization for large vocabulary speech recognition. conference of the international speech communication association. ,(1997)