Unsupervised joint prosody labeling and modeling for Mandarin speech.

作者: Chen-Yu Chiang , Sin-Horng Chen , Hsiu-Min Yu , Yih-Ru Wang

DOI: 10.1121/1.3056559

关键词: SyllableSpeech recognitionFeature (machine learning)Mandarin ChineseVariation (linguistics)Nonverbal communicationProsodyNatural languageJunctureLinguisticsComputer scienceDiscriminative model

摘要: An unsupervised joint prosody labeling and modeling method for Mandarin speech is proposed, a new scheme intended to construct statistical prosodic models label tags consistently speech. Two types of are determined by four designed illustrate the hierarchy prosody: break syllable juncture demarcate constituents state represent any domain’s pitch-level variation resulting from its upper-layered constituents’ influences. The performance proposed was evaluated using an unlabeled read-speech corpus articulated experienced female announcer. Experimental results showed that estimated parameters were able explore describe structures patterns prosody. Besides, certain corresponding relationships between indices labeled associated words found, manifested connections linguistic parameters, finding further verifying capability presented. Finally, quantitative comparison in human labelers indicated former more consistent discriminative than latter feature distributions, merit developed here on applications modeling.

参考文章(74)
Mark Hasegawa-Johnson, Ken Chen, How Prosody Improves Word Recognition ,(2004)
Zhigang Yin, Wu Hua, Xiaoxia Chen, Jingzhu Yan, Guohua Sun, Maocan Lin, Yiqing Zu, Aijun Li, Speech corpus of Chinese discourse and the phonetic research. conference of the international speech communication association. pp. 13- 18 ,(2000)
John F. Pitrelli, Janet B. Pierrehumbert, Julia Hirschberg, Colin W. Wightman, Mary E. Beckman, Mari Ostendorf, Patti Price, Kim E. A. Silverman, TOBI: a standard for labeling English prosody. conference of the international speech communication association. ,(1992)
Steve Renals, Yoshihiko Gotoh, Sentence Boundary Detection in Broadcast Speech Transcripts ASR2000 - Automatic Speech Recognition: Challenges for the new Millenium. pp. 228- 235 ,(2000)
Paul Taylor, The Tilt Intonation Model conference of the international speech communication association. ,(1998)
Ren-Hua Wang, Jian-Feng Li, Guoping Hu, Chinese prosody phrase break prediction based on maximum entropy model. conference of the international speech communication association. ,(2004)
Philip C. Woodland, Ji-Hwan Kim, The use of prosody in a combined system for punctuation generation and speech recognition conference of the international speech communication association. pp. 2757- 2760 ,(2001)
Che-Kuang Lin, Lin-Shan Lee, Improved spontaneous Mandarin speech recognition by disfluency interruption point (IP) detection using prosodic features. conference of the international speech communication association. pp. 1621- 1624 ,(2005)