Prosody and phonetic variability: Lessons learned from acoustic model clustering

作者： Mari Ostendorf , Richard Wright , Izhak Shafran

DOI:

关键词:

摘要: Most research on the use of prosody in automatic speech processing has focused F0, energy and duration correlates to prosodic structure. However, there are multiple sources evidence suggesting that spectral as well. This paper presents an analysis prosodically labeled conversational data using acoustic parameters clustering techniques standard recognition. We find differences primarily associated with segment position at constituent onsets prominent syllables. Importantly, phones fluent vs. disfluent boundaries frequently placed different clusters. These can be leveraged a “multiple pronunciation” model aid detecting boundaries, potentially for improving recognition accuracy.

isca-speech.org PDF 下载加速

参考文章(18)

Andreas Stolcke, Dilek Zeynep Hakkani, Madelaine Plauché, Elizabeth Shriberg, Mari Ostendorf, Rebecca A. Bates, Gökhan Tür, Yu Lu, Automatic detection of sentence boundaries and disfluencies based on recognized words. conference of the international speech communication association. ,(1998)

Jie Zhang, Richard Wright, Patricia Keating, Word-level asymmetries in consonant articulation ,(2001)

Dennis H. Klatt, Vowel Lengthening is Syntactically Determined in a Connected Discourse. Journal of Phonetics. ,vol. 3, pp. 129- 140 ,(1975) , 10.1016/S0095-4470(19)31360-9

Eugene Charniak, A maximum-entropy-inspired parser north american chapter of the association for computational linguistics. pp. 132- 139 ,(2000)

Andreas Stolcke, Modeling linguistic segment and turn boundaries for n-best rescoring of spontaneous speech. conference of the international speech communication association. ,(1997)

I. Shafran, M. Ostendorf, Use of higher level linguistic structure in acoustic modeling for speech recognition international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 1021- 1024 ,(2000) , 10.1109/ICASSP.2000.859136

Jan McAllister, The processing of lexically stressed syllables in read and spontaneous speech Language and Speech. ,vol. 34, pp. 1- 26 ,(1991) , 10.1177/002383099103400101

P. Dognin, A. El-Jaroudi, J. Billa, Parameter optimization for vocal tract length normalization international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1767- 1770 ,(2000) , 10.1109/ICASSP.2000.862095

Lloyd H. Nakatani, Kathleen D. Dukes, Locus of segmental cues for word juncture The Journal of the Acoustical Society of America. ,vol. 62, pp. 714- 719 ,(1977) , 10.1121/1.381583

10.

D. Kimbrough Oller, The effect of position in utterance on speech segment duration in English The Journal of the Acoustical Society of America. ,vol. 54, pp. 1235- 1247 ,(1973) , 10.1121/1.1914393

Prosody and phonetic variability: Lessons learned from acoustic model clustering

来源期刊

我的账户

Prosody and phonetic variability: Lessons learned from acoustic model clustering

来源期刊

相似文章 10

我的账户