Improved Speech Recognition using Acoustic and Lexical Correlates of Pitch Accent in a N-Best Rescoring Framework

作者: Sankaranarayanan Ananthakrishnan , Shrikanth Narayanan

DOI: 10.1109/ICASSP.2007.367209

关键词:

摘要: Most statistical speech recognition systems make use of segment-level features, derived mainly from spectral envelope characteristics the signal, but ignore supra-segmental cues that carry additional information likely to be useful for recognition. These cues, which constitute prosody utterance and occur at syllable, word level, are closely related lexical syntactic organization utterance. In this paper, we explore acoustic correlates a subset these in order improve performance on read-speech corpus, using error rate (WER) as metric. Using features methods described were able obtain relative WER improvement 1.3% over baseline ASR system Boston University Radio News Corpus.

参考文章(8)
Stephanie Seneff, Chao Wang, Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech in the JUPITER Domain conference of the international speech communication association. pp. 2761- 2765 ,(2001)
Shrikanth S. Narayanan, Sankaranarayanan Ananthakrishnan, Combining acoustic, lexical, and syntactic evidence for automatic unsupervised prosody labeling. conference of the international speech communication association. ,(2006)
Mari Ostendorf, Rebecca Bates, Izhak Shafran, PROSODY MODELS FOR CONVERSATIONAL SPEECH RECOGNITION ,(2003)
Andreas Stolcke, SRILM – An Extensible Language Modeling Toolkit conference of the international speech communication association. ,(2002)
Jeff A. Bilmes, Katrin Kirchhoff, Factored language models and generalized parallel backoff Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology companion volume of the Proceedings of HLT-NAACL 2003--short papers - NAACL '03. pp. 4- 6 ,(2003) , 10.3115/1073483.1073485
Ken Chen, M. Hasegawa-Johnson, A. Cohen, An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 509- 512 ,(2004) , 10.1109/ICASSP.2004.1326034
Taejin Yoon, Jennifer Cole, Chilin Shih, Sarah Borys, Heejin Kim, Mark Hasegawa-Johnson, Jeung-Yoon Choi, Sandra Chavarria, Ken Chen, Aaron Cohen, Speech Recognition Models of the Interdependence Among Syntax, Prosody, and Segmental Acoustics north american chapter of the association for computational linguistics. pp. 56- 63 ,(2004)