On the relative importance of various components of the modulation spectrum for automatic speech recognition

作者： Noboru Kanedera , Takayuki Arai , Hynek Hermansky , Misha Pavel

关键词:

摘要: We measured the accuracy of speech recognition as a function band-pass filtering time trajectories spectral envelopes. examined (i) several types recognizers such dynamic warping (DTW) and hidden Markov model (HMM), (ii) features, filter bank output, mel-frequency cepstral coefficients (MFCC), perceptual linear predictive (PLP) coefficients. used resulting data to determine relative importance information in different modulation components for automatic recognition. concluded that: (1) most useful linguistic is frequency from range between 1 16 Hz, with dominant component at around 4 Hz; (2) some realistic environments, use below 2 Hz or above can degrade accuracy.

uni-trier.de 本地加速

sciencedirect.com 本地加速

elsevier.com 本地加速

doi.org 本地加速

elsevier.com 本地加速

sciencedirect.com LINK 下载加速

sci-hub.se PDF 下载加速

参考文章(11)

T. Arai, M. Pavel, H. Hermansky, C. Avendano, Intelligibility of speech with filtered time trajectories of spectral envelopes international conference on spoken language processing. ,vol. 4, pp. 2490- 2493 ,(1996) , 10.1109/ICSLP.1996.607318

Corpus-based methods in language and speech processing Springer Netherlands. ,(1997) , 10.1007/978-94-017-1183-8

S. Furui, Speaker-independent isolated word recognition using dynamic features of speech spectrum IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 34, pp. 52- 59 ,(1986) , 10.1109/TASSP.1986.1164788

Rob Drullman, Joost M. Festen, Reinier Plomp, Effect of temporal envelope smearing on speech reception The Journal of the Acoustical Society of America. ,vol. 95, pp. 1053- 1064 ,(1994) , 10.1121/1.408467

B. S. Atal, Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification The Journal of the Acoustical Society of America. ,vol. 55, pp. 1304- 1312 ,(1974) , 10.1121/1.1914702

T. Houtgast, H. J. M. Steeneken, A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria Journal of the Acoustical Society of America. ,vol. 77, pp. 1069- 1077 ,(1985) , 10.1121/1.392224

Effect of reducing slow temporal modulations on speech reception Journal of the Acoustical Society of America. ,vol. 95, pp. 2670- 2680 ,(1994) , 10.1121/1.409836

Hynek Hermansky, Perceptual linear predictive (PLP) analysis of speech Journal of the Acoustical Society of America. ,vol. 87, pp. 1738- 1752 ,(1990) , 10.1121/1.399423

H. Hermansky, N. Morgan, RASTA processing of speech IEEE Transactions on Speech and Audio Processing. ,vol. 2, pp. 578- 589 ,(1994) , 10.1109/89.326616

10.

H. Hermansky, N. Morgan, H.-G. Hirsch, Recognition of speech in additive and convolutional noise based on RASTA spectral processing IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 2, pp. 83- 86 ,(1993) , 10.1109/ICASSP.1993.319236

On the relative importance of various components of the modulation spectrum for automatic speech recognition

来源期刊

我的账户

On the relative importance of various components of the modulation spectrum for automatic speech recognition

来源期刊

相似文章 10

我的账户