作者: John-Paul Hosom
DOI: 10.1016/J.SPECOM.2008.11.003
关键词:
摘要: Determining the location of phonemes is important to a number speech applications, including training automatic recognition systems, building text-to-speech and research on human processing. Agreement humans is, average, 93.78% within 20ms variety corpora, 93.49% TIMIT corpus. We describe baseline forced-alignment system proposed with several modifications this baseline. Modifications include addition energy-based features standard cepstral feature set, use probabilities state transition given an observation, computation distinctive phonetic instead phoneme-level probabilities. Performance test partition corpus 91.48% 20ms, performance 93.36% 20ms. The results are 22% relative reduction in error over system, 14% from non-HMM alignment system. This result agreement best known reported