On the relative importance of various components of the modulation spectrum for automatic speech recognition

作者: Noboru Kanedera , Takayuki Arai , Hynek Hermansky , Misha Pavel

DOI: 10.1016/S0167-6393(99)00002-3

关键词:

摘要: We measured the accuracy of speech recognition as a function band-pass filtering time trajectories spectral envelopes. examined (i) several types recognizers such dynamic warping (DTW) and hidden Markov model (HMM), (ii) features, filter bank output, mel-frequency cepstral coefficients (MFCC), perceptual linear predictive (PLP) coefficients. used resulting data to determine relative importance information in different modulation components for automatic recognition. concluded that: (1) most useful linguistic is frequency from range between 1 16 Hz, with dominant component at around 4 Hz; (2) some realistic environments, use below 2 Hz or above can degrade accuracy.

参考文章(11)
T. Arai, M. Pavel, H. Hermansky, C. Avendano, Intelligibility of speech with filtered time trajectories of spectral envelopes international conference on spoken language processing. ,vol. 4, pp. 2490- 2493 ,(1996) , 10.1109/ICSLP.1996.607318
S. Furui, Speaker-independent isolated word recognition using dynamic features of speech spectrum IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 34, pp. 52- 59 ,(1986) , 10.1109/TASSP.1986.1164788
Rob Drullman, Joost M. Festen, Reinier Plomp, Effect of temporal envelope smearing on speech reception The Journal of the Acoustical Society of America. ,vol. 95, pp. 1053- 1064 ,(1994) , 10.1121/1.408467
B. S. Atal, Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification The Journal of the Acoustical Society of America. ,vol. 55, pp. 1304- 1312 ,(1974) , 10.1121/1.1914702
T. Houtgast, H. J. M. Steeneken, A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria Journal of the Acoustical Society of America. ,vol. 77, pp. 1069- 1077 ,(1985) , 10.1121/1.392224
Effect of reducing slow temporal modulations on speech reception Journal of the Acoustical Society of America. ,vol. 95, pp. 2670- 2680 ,(1994) , 10.1121/1.409836
Hynek Hermansky, Perceptual linear predictive (PLP) analysis of speech Journal of the Acoustical Society of America. ,vol. 87, pp. 1738- 1752 ,(1990) , 10.1121/1.399423
H. Hermansky, N. Morgan, RASTA processing of speech IEEE Transactions on Speech and Audio Processing. ,vol. 2, pp. 578- 589 ,(1994) , 10.1109/89.326616
H. Hermansky, N. Morgan, H.-G. Hirsch, Recognition of speech in additive and convolutional noise based on RASTA spectral processing IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 2, pp. 83- 86 ,(1993) , 10.1109/ICASSP.1993.319236