作者: Mari Ostendorf , Richard Wright , Izhak Shafran
DOI:
关键词:
摘要: Most research on the use of prosody in automatic speech processing has focused F0, energy and duration correlates to prosodic structure. However, there are multiple sources evidence suggesting that spectral as well. This paper presents an analysis prosodically labeled conversational data using acoustic parameters clustering techniques standard recognition. We find differences primarily associated with segment position at constituent onsets prominent syllables. Importantly, phones fluent vs. disfluent boundaries frequently placed different clusters. These can be leveraged a “multiple pronunciation” model aid detecting boundaries, potentially for improving recognition accuracy.