Prosody dependent speech recognition on radio news corpus of American English

作者: K. Chen , M. Hasegawa-Johnson , A. Cohen , S. Borys , Sung-Suk Kim

DOI: 10.1109/TSA.2005.853208

关键词:

摘要: Does prosody help word recognition? This paper proposes a novel probabilistic framework in which and phoneme are dependent on way that reduces error rates (WER) relative to prosody-independent recognizer with comparable parameter count. In the proposed prosody-dependent speech recognizer, models conditioned two important prosodic variables: intonational phrase boundary pitch accent. An information-theoretic analysis is provided show acoustic language modeling can increase mutual information between true hypothesis observation by exciting interaction model model. Empirically, results indicate influence of these variables allophonic mainly restricted small subset distributions: duration PDFs (modeled using an explicit hidden Markov or EDHMM) acoustic-prosodic (normalized frequency). Influence cepstral features limited phonemes: for example, vowels may be influenced both accent position, but phrase-initial phrase-final consonants independent Leveraging results, effective built minimal These recognizers able reduce up 11% count, experiments based prosodically-transcribed Boston Radio News corpus.

参考文章(32)
Mari Ostendorf, Richard Wright, Izhak Shafran, Prosody and phonetic variability: Lessons learned from acoustic model clustering ,(2003)
T. Zeppenfeld, E. Shriberg, M. Ostendorf, M. Finke, S. Roweis, A. Waibel, A. Gunawardana, K. Ross, M. Bacchiani, B. Wheatley, D. Talkin, B. Byrne, Modeling Systematic Variations in Pronunciation via a Language-Dependent Hidden Speaking Mode ,(1999)
John F. Pitrelli, Julia Hirschberg, Mary E. Beckman, Evaluation of prosodic transcription labeling reliability in the tobi framework. conference of the international speech communication association. ,(1994)
Philip C. Woodland, Ji-Hwan Kim, The use of prosody in a combined system for punctuation generation and speech recognition conference of the international speech communication association. pp. 2757- 2760 ,(2001)
Mark Hasegawa-Johnson, Ken Chen, Aaron Cohen, A Maximum Likelihood Prosody Recognizer ,(2004)
Mitchel Weintraub, Elizabeth Shriberg, Larry P. Heck, M. Kemal Sönmez, Modeling dynamic prosodic variation for speaker verification. conference of the international speech communication association. ,(1998)
Y. Normandin, Optimal splitting of HMM Gaussian mixture components with MMIE training international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 449- 452 ,(1995) , 10.1109/ICASSP.1995.479625
J. G. Carbonell, Ralf Kompe, J. Siekmann, Prosody in Speech Understanding Systems ,(1997)