Acoustic model clustering based on syllable structure

作者: Izhak Shafran , Mari Ostendorf

DOI: 10.1016/S0885-2308(02)00049-9

关键词: Computer scienceSpeech recognitionWord (computer architecture)SyllableArtificial intelligenceTree (data structure)Natural language processingCluster analysisVariation (linguistics)Context (language use)Syllabic verseAcoustic model

摘要: Current speech recognition systems perform poorly on conversational as compared to read speech, arguably due the large acoustic variability inherent in speech. Our hypothesis is that there are systematic effects local context, associated with syllabic structure, not being captured current models. Such variation may be modeled using a broader definition of context than traditional which restrict neighboring phonemes. In this paper, we study use word- and syllable-level conditioning recognizing We describe method extend standard tree-based clustering incorporate number features, report results Switchboard task indicate syllable structure outperforms pentaphones incurs less computational cost. It has been hypothesized previous work models for English was limited because ignoring phenomenon resyllabification (change at word boundaries), but our analysis shows accounting does impact performance.

参考文章(2)
Mari Ostendorf, Richard Wright, Izhak Shafran, Prosody and phonetic variability: Lessons learned from acoustic model clustering ,(2003)
J.J. Godfrey, E.C. Holliman, J. McDaniel, SWITCHBOARD: telephone speech corpus for research and development international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 517- 520 ,(1992) , 10.1109/ICASSP.1992.225858