Understanding Tonal Languages

作者: Stephen A. Zahorian

DOI:

关键词: Hidden Markov modelNatural language processingContext (language use)Mandarin ChineseArtificial intelligenceComputer scienceSyllableMel-frequency cepstrumSpeech recognitionFeature (machine learning)Tone (musical instrument)Linguistic Data Consortium

摘要: Abstract : This report gives a detailed summary of research work completed under Air Force Research Laboratory (AFRL) grant 56236, over the time period (November 17, 2010 - November 16, 2012). The main objective was to study various methods for Mandarin syllable recognition. Techniques were explored both base recognition and lexical tone RASC863 database, obtained from Chinese Linguistic Data Consortium used experimental work. Basel phone (60 phones) done with Hidden Markov Model recognizer. Best results approximately 69%. Human listeners establish baseline Tone accuracy humans ranges about 55% 90%, depending on how much context is given listeners. best classification neural network classifier 76%. recognizer 71%. In addition ASR experiments Mandarin, basic improved pitch tracking, refinement spectral/temporal features (DCTCs/DCSCs) done. It determined that longer intervals are preferred dynamic feature calculations than typically MFCC features. Also segment somewhat English.

参考文章(12)
Tan Lee, Mei-Yuh Hwang, Xin Lei, Mari Ostendorf, Man-Hung Siu, Improved Tone Modeling for Mandarin Broadcast News Speech Recognition conference of the international speech communication association. ,vol. 3, pp. 1237- ,(2006)
Eric Chang, Shuo Di, Jian-Lai Zhou, Chao Huang, Kai-Fu Lee, Large vocabulary Mandarin speech recognition with different approaches in modeling tones. conference of the international speech communication association. pp. 983- 986 ,(2000)
B.R. Glasberg, B.C.J. Moore, A revision of Zwicker's loudness model Acustica. ,vol. 82, pp. 335- 345 ,(1996)
Hank Chang-Han Huang, F. Seide, Pitch tracking and tone features for Mandarin speech recognition international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1523- 1526 ,(2000) , 10.1109/ICASSP.2000.861942
S.A. Zahorian, P. Silsbee, Xihong Wang, Phone classification with segmental features and a binary-pair partitioned neural network classifier 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing. ,vol. 2, pp. 1011- 1014 ,(1997) , 10.1109/ICASSP.1997.596111
Stefan Strahl, Alfred Mertins, Analysis and design of gammatone signal models. Journal of the Acoustical Society of America. ,vol. 126, pp. 2379- 2389 ,(2009) , 10.1121/1.3212919
Santitham Prom-on, Fang Liu, Yi Xu, Post-low bouncing in Mandarin Chinese: acoustic analysis and computational modeling. Journal of the Acoustical Society of America. ,vol. 132, pp. 421- 432 ,(2012) , 10.1121/1.4725762
Stephen A. Zahorian, Hongbing Hu, A spectral/temporal method for robust fundamental frequency tracking The Journal of the Acoustical Society of America. ,vol. 123, pp. 4559- 4571 ,(2008) , 10.1121/1.2916590
S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 28, pp. 65- 74 ,(1980) , 10.1109/TASSP.1980.1163420