作者: Izhak Shafran , Mari Ostendorf
DOI: 10.1016/S0885-2308(02)00049-9
关键词: Computer science 、 Speech recognition 、 Word (computer architecture) 、 Syllable 、 Artificial intelligence 、 Tree (data structure) 、 Natural language processing 、 Cluster analysis 、 Variation (linguistics) 、 Context (language use) 、 Syllabic verse 、 Acoustic model
摘要: Current speech recognition systems perform poorly on conversational as compared to read speech, arguably due the large acoustic variability inherent in speech. Our hypothesis is that there are systematic effects local context, associated with syllabic structure, not being captured current models. Such variation may be modeled using a broader definition of context than traditional which restrict neighboring phonemes. In this paper, we study use word- and syllable-level conditioning recognizing We describe method extend standard tree-based clustering incorporate number features, report results Switchboard task indicate syllable structure outperforms pentaphones incurs less computational cost. It has been hypothesized previous work models for English was limited because ignoring phenomenon resyllabification (change at word boundaries), but our analysis shows accounting does impact performance.