A spectral/temporal method for robust fundamental frequency tracking

作者: Stephen A. Zahorian , Hongbing Hu

DOI: 10.1121/1.2916590

关键词:

摘要: In this paper, a fundamental frequency (F(0)) tracking algorithm is presented that extremely robust for both high quality and telephone speech, at signal to noise ratios ranging from clean speech very noisy speech. The named "YAAPT," "yet another pitch tracking." based on combination of time domain processing, using the normalized cross correlation, processing. Major steps include processing original acoustic nonlinearly processed version signal, use new method computing modified autocorrelation function incorporates information multiple spectral harmonic peaks, peak picking select F(0) candidates associated figures merit, extensive dynamic programming find "best" track among candidates. was evaluated by three databases compared other published algorithms various conditions. For error rates obtained are comparable those with best results reported any algorithm; lower than methods.

参考文章(19)
William A. Ainsworth, Georg F. Meyer, Fabrice Plante, A pitch extraction reference database. conference of the international speech communication association. ,(1995)
Wolfgang Hess, Pitch Determination of Speech Signals Springer Berlin Heidelberg. ,(1983) , 10.1007/978-3-642-81926-1
Stephen A. Zahorian, Hongbing Hu, Princy Dikshit, A Spectral-Temporal Method for Pitch Tracking conference of the international speech communication association. ,(2006)
Eric Chang, Shuo Di, Jian-Lai Zhou, Chao Huang, Kai-Fu Lee, Large vocabulary Mandarin speech recognition with different approaches in modeling tones. conference of the international speech communication association. pp. 983- 986 ,(2000)
Stephanie Seneff, Chao Wang, A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition. conference of the international speech communication association. ,(1998)
M. Ostendorf, K. Ross, A Multi-level Model for Recognition of Intonation Labels Computing Prosody. pp. 291- 308 ,(1997) , 10.1007/978-1-4612-2258-3_19
E. Mousset, W.A. Ainsworth, J.A.R. Fonollosa, A comparison of several recent methods of fundamental frequency and voicing decision estimation international conference on spoken language processing. ,vol. 2, pp. 1273- 1276 ,(1996) , 10.1109/ICSLP.1996.607842
P. Boersma, Praat, a system for doing phonetics by computer Glot International. ,vol. 5, pp. 341- 345 ,(2002)
Chao Wang, S. Seneff, Robust pitch tracking for prosodic modeling in telephone speech international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1343- 1346 ,(2000) , 10.1109/ICASSP.2000.861827
Tomohiro Nakatani, Toshio Irino, Robust and accurate fundamental frequency estimation based on dominant harmonic components Journal of the Acoustical Society of America. ,vol. 116, pp. 3690- 3700 ,(2004) , 10.1121/1.1787522