Improved Tone Modeling for Mandarin Broadcast News Speech Recognition

作者: Tan Lee , Mei-Yuh Hwang , Xin Lei , Mari Ostendorf , Man-Hung Siu

DOI:

关键词:

摘要: Tone has a crucial role in Mandarin speech distinguishing ambiguous words. Most state-of-the-art automatic recognition systems adopt embedded tone modeling, where tonal acoustic units are used and F0 features appended to the spectral feature vector. In this paper, we combine aproach (using improved smoothing) with explicit modeling rescoring output lattices. Oracle experiments indicate 32% relative improvement can be achieved by perfect information. Recognition on broadcast news show that, even an accuracy of only 70%, classifier offers complementary knowledge improves performance significantly. Through combination techniques, character error rate CTV test set from 13.0% 11.5%.

参考文章(10)
Yao Qian, Tan Lee, Use of tone information in cantonese lvcsr based on generalized character posterior probability decoding The Chinese University of Hong Kong (People's Republic of China). ,(2005)
Michael A. Picheny, Ramesh A. Gopinath, Michael D. Monkowski, C. Julian Chen, Katherine Shen, New methods in continuous Mandarin speech recognition. conference of the international speech communication association. ,(1997)
Eric Chang, Shuo Di, Jian-Lai Zhou, Chao Huang, Kai-Fu Lee, Large vocabulary Mandarin speech recognition with different approaches in modeling tones. conference of the international speech communication association. pp. 983- 986 ,(2000)
Hank Chang-Han Huang, F. Seide, Pitch tracking and tone features for Mandarin speech recognition international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1523- 1526 ,(2000) , 10.1109/ICASSP.2000.861942
Tan Lee, Wai Lau, Y. W. Wong, P. C. Ching, Using tone information in Cantonese continuous speech recognition ACM Transactions on Asian Language Information Processing. ,vol. 1, pp. 83- 102 ,(2002) , 10.1145/595576.595581
Stephanie Seneff, Chao Wang, Prosodic modeling for improved speech recognition and understanding Massachusetts Institute of Technology. ,(2001)
Gang Peng, William S.-Y. Wang, Tone recognition of continuous Cantonese speech based on support vector machines Speech Communication. ,vol. 45, pp. 49- 62 ,(2005) , 10.1016/J.SPECOM.2004.09.004
Sim-Horng Chen, Yih-Ru Wang, Tone recognition of continuous Mandarin speech based on neural networks IEEE Transactions on Speech and Audio Processing. ,vol. 3, pp. 146- 150 ,(1995) , 10.1109/89.366544
Pui-Fung WONG, Man-Hung SIU, Decision tree based tone modeling for Chinese speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 905- 908 ,(2004) , 10.1109/ICASSP.2004.1326133
A. Stolcke, V. Gadde, M. Hwang, T. Ng, W. Wang, J. Zheng, M. Ostendorf, X. Lei, Porting Decipher from English to Mandarin ,(2006)