作者: I. Shafran , M. Ostendorf
DOI: 10.1109/ICASSP.2000.859136
关键词:
摘要: Current speech recognition systems perform poorly on conversational as compared to read speech, largely because of the additional acoustic variability observed in speech. Our hypothesis is that there are systematic effects, related higher level structures, not being captured current models. In this paper we describe a method extend standard clustering incorporate such features estimating We report improvements obtained Switchboard task over triphones and pentaphones by use word- syllable-level features. addition, preliminary studies with prosodic information.