Speaker-independent phoneme alignment using transition-dependent states

作者: John-Paul Hosom

DOI: 10.1016/J.SPECOM.2008.11.003

关键词:

摘要: Determining the location of phonemes is important to a number speech applications, including training automatic recognition systems, building text-to-speech and research on human processing. Agreement humans is, average, 93.78% within 20ms variety corpora, 93.49% TIMIT corpus. We describe baseline forced-alignment system proposed with several modifications this baseline. Modifications include addition energy-based features standard cepstral feature set, use probabilities state transition given an observation, computation distinctive phonetic instead phoneme-level probabilities. Performance test partition corpus 91.48% 20ms, performance 93.36% 20ms. The results are 22% relative reduction in error over system, 14% from non-HMM alignment system. This result agreement best known reported

参考文章(47)
Torbjørn Svendsen, Knut Kvale, Automatic alignment of phonemic labels with continuous speech. conference of the international speech communication association. ,(1990)
John-Paul Hosom, Khaldoun Shobaki, Ronald A. Cole, The OGI kids² speech corpus and recognizers. conference of the international speech communication association. pp. 258- 261 ,(2000)
P. Brezillon, P. Bouquet, Lecture Notes in Artificial Intelligence ,(1999)
John-Paul Hosom, Ronald A. Cole, Burst detection based on measurements of intensity discrimination. conference of the international speech communication association. pp. 564- 567 ,(2000)
Alex Acero, Xuedong Huang, Hsiao-Wuen Hon, Spoken Language Processing Prentice-Hall. pp. 1008- ,(2001)
Maurizio Omologo, Daniele Falavigna, Piero Cosi, A preliminary statistical evaluation of manual and automatic segmentation discrepancies. conference of the international speech communication association. ,(1991)
Olivier Deroo, Thierry Dutoit, Fabrice Malfrère, Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN. conference of the international speech communication association. ,(1998)
Mark A. Fanty, Mike Noel, Terri Lander, Ronald A. Cole, Beatrice T. Oshika, Labeler agreement in phonetic labeling of continuous speech. conference of the international speech communication association. ,(1994)
Yifan Gong, Jean Paul Haton, Iterative transformation and alignment for speech labeling. conference of the international speech communication association. ,(1993)
Peter Jackson, Richard Brady, Stephen Cox, Techniques for accurate automatic annotation of speech waveforms. conference of the international speech communication association. ,(1998)