Prosody and phonetic variability: Lessons learned from acoustic model clustering

作者: Mari Ostendorf , Richard Wright , Izhak Shafran

DOI:

关键词:

摘要: Most research on the use of prosody in automatic speech processing has focused F0, energy and duration correlates to prosodic structure. However, there are multiple sources evidence suggesting that spectral as well. This paper presents an analysis prosodically labeled conversational data using acoustic parameters clustering techniques standard recognition. We find differences primarily associated with segment position at constituent onsets prominent syllables. Importantly, phones fluent vs. disfluent boundaries frequently placed different clusters. These can be leveraged a “multiple pronunciation” model aid detecting boundaries, potentially for improving recognition accuracy.

参考文章(18)
Andreas Stolcke, Dilek Zeynep Hakkani, Madelaine Plauché, Elizabeth Shriberg, Mari Ostendorf, Rebecca A. Bates, Gökhan Tür, Yu Lu, Automatic detection of sentence boundaries and disfluencies based on recognized words. conference of the international speech communication association. ,(1998)
Jie Zhang, Richard Wright, Patricia Keating, Word-level asymmetries in consonant articulation ,(2001)
Dennis H. Klatt, Vowel Lengthening is Syntactically Determined in a Connected Discourse. Journal of Phonetics. ,vol. 3, pp. 129- 140 ,(1975) , 10.1016/S0095-4470(19)31360-9
Eugene Charniak, A maximum-entropy-inspired parser north american chapter of the association for computational linguistics. pp. 132- 139 ,(2000)
Andreas Stolcke, Modeling linguistic segment and turn boundaries for n-best rescoring of spontaneous speech. conference of the international speech communication association. ,(1997)
I. Shafran, M. Ostendorf, Use of higher level linguistic structure in acoustic modeling for speech recognition international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 1021- 1024 ,(2000) , 10.1109/ICASSP.2000.859136
P. Dognin, A. El-Jaroudi, J. Billa, Parameter optimization for vocal tract length normalization international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1767- 1770 ,(2000) , 10.1109/ICASSP.2000.862095
Lloyd H. Nakatani, Kathleen D. Dukes, Locus of segmental cues for word juncture The Journal of the Acoustical Society of America. ,vol. 62, pp. 714- 719 ,(1977) , 10.1121/1.381583
D. Kimbrough Oller, The effect of position in utterance on speech segment duration in English The Journal of the Acoustical Society of America. ,vol. 54, pp. 1235- 1247 ,(1973) , 10.1121/1.1914393