Hierarchical Processing of the Modulation Spectrum for GALE Mandarin LVCSR system

作者: Christian Plahl , Fabio Valente , Mathew Magimai.-Doss , Ravuri Suman

DOI:

关键词: Speech recognitionReduction (complexity)Critical bandData setMandarin ChineseEnergy (signal processing)Computer scienceModulation spectrum

摘要: This paper aims at investigating the use of TANDEM features based on hierarchical processing modulation spectrum. The study is done in framework GALE project for recognition Mandarin Broadcast data. We describe improvements obtained using and addition like pitch short-term critical band energy. Results are consistent with previous findings a different LVCSR task suggesting that proposed technique effective robust across several conditions. Furthermore we integration into RWTH system trained 1600 hours data present progress 2007 2008 systems resulting approximatively 20% CER reduction set.

参考文章(12)
Christian Plahl, Ralf Schlüter, Björn Hoffmeister, Jonas Lööf, Hermann Ney, Georg Heigold, Development of the GALE 2008 Mandarin LVCSR system. conference of the international speech communication association. pp. 2107- 2110 ,(2009)
Tan Lee, Mei-Yuh Hwang, Xin Lei, Mari Ostendorf, Man-Hung Siu, Improved Tone Modeling for Mandarin Broadcast News Speech Recognition conference of the international speech communication association. ,vol. 3, pp. 1237- ,(2006)
Christian Plahl, Ralf Schlüter, Björn Hoffmeister, Jonas Lööf, Hermann Ney, D. Lu, Georg Heigold, M.-Y. Hwang, Recent improvements of the RWTH GALE Mandarin LVCSR system. conference of the international speech communication association. pp. 2426- 2429 ,(2008)
Herve A. Bourlard, Nelson Morgan, Connectionist Speech Recognition: A Hybrid Approach Kluwer Academic Publishers. ,(1993)
Brian E.D Kingsbury, Nelson Morgan, Steven Greenberg, Robust speech recognition using the modulation spectrogram Speech Communication. ,vol. 25, pp. 117- 132 ,(1998) , 10.1016/S0167-6393(98)00032-6
Hynek Hermansky, Should recognizers have ears Speech Communication. ,vol. 25, pp. 3- 27 ,(1998) , 10.1016/S0167-6393(98)00027-2
Hynek Hermansky, Petr Fousek, Multi-resolution RASTA filtering for TANDEM-based ASR conference of the international speech communication association. pp. 361- 364 ,(2005)
Fabio Valente, Hynek Hermansky, Hierarchical and parallel processing of modulation spectrum for ASR applications international conference on acoustics, speech, and signal processing. pp. 4165- 4168 ,(2008) , 10.1109/ICASSP.2008.4518572
N. Morgan, B.Y. Chen, Q. Zhu, A. Stolcke, Trapping conversational speech: extending TRAP/tandem approaches to conversational telephone speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 537- 540 ,(2004) , 10.1109/ICASSP.2004.1326041
H. Hermansky, D.P.W. Ellis, S. Sharma, Tandem connectionist feature extraction for conventional HMM systems international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1635- 1638 ,(2000) , 10.1109/ICASSP.2000.862024