Multi-stream to many-stream: using spectro-temporal features for ASR.

作者: Nelson Morgan , Suman V. Ravuri , Sherry Y. Zhao

DOI:

关键词:

摘要: We report progress in the use of multi-stream spectro-temporal features for both small and large vocabulary automatic speech recognition tasks. Features are divided into multiple streams parallel processing dynamic utilization this approach. For experiments, incorporation up to 28 dynamically-weighted feature along with MFCCs yields roughly 21% improvement on baseline low noise conditions 47% noise-added conditions, a greater than our previous work. A four stream framework 14% over experiment. These results suggest that division may be an effective way flexibly utilize inherently number recognition.

参考文章(14)
Tan Lee, Mei-Yuh Hwang, Xin Lei, Mari Ostendorf, Man-Hung Siu, Improved Tone Modeling for Mandarin Broadcast News Speech Recognition conference of the international speech communication association. ,vol. 3, pp. 1237- ,(2006)
Nelson Morgan, Sherry Y. Zhao, Multi-stream spectro-temporal features for robust speech recognition. conference of the international speech communication association. pp. 898- 901 ,(2008)
Michael Kleinschmidt, Localized spectro-temporal features for automatic speech recognition. conference of the international speech communication association. ,(2003)
H. Bourlard, S. Dupont, A mew ASR approach based on independent processing and recombination of partial frequency bands international conference on spoken language processing. ,vol. 1, pp. 426- 429 ,(1996) , 10.1109/ICSLP.1996.607145
Noboru Kanedera, Takayuki Arai, Hynek Hermansky, Misha Pavel, On the relative importance of various components of the modulation spectrum for automatic speech recognition Speech Communication. ,vol. 28, pp. 43- 55 ,(1999) , 10.1016/S0167-6393(99)00002-3
Taishih Chi, Yujie Gao, Matthew C. Guyton, Powen Ru, Shihab Shamma, Spectro-temporal modulation transfer functions and speech intelligibility. Journal of the Acoustical Society of America. ,vol. 106, pp. 2719- 2732 ,(1999) , 10.1121/1.428100
M. Ostendorf, P. Jain, H. Hermansky, D. Ellis, G. Doddington, B. Chen, O. Cretin, H. Bourlard, M. Athineos, N. Morgan, Qifeng Zhu, A. Stolcke, K. Sonmez, S. Sivadas, T. Shinozaki, Pushing the envelope - aside [speech recognition] IEEE Signal Processing Magazine. ,vol. 22, pp. 81- 88 ,(2005) , 10.1109/MSP.2005.1511826
Hynek Hermansky, Petr Fousek, Multi-resolution RASTA filtering for TANDEM-based ASR conference of the international speech communication association. pp. 361- 364 ,(2005)
H. Misra, H. Bourlard, V. Tyagi, New entropy based combination rules in HMM/ANN multi-stream ASR international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 741- 744 ,(2003) , 10.1109/ICASSP.2003.1202473
Hynek Hermansky, Fabio Valente, On the Combination of Auditory and Modulation Frequency Channels for ASR applications conference of the international speech communication association. pp. 2242- 2245 ,(2008)