作者: Noboru Kanedera , Takayuki Arai , Hynek Hermansky , Misha Pavel
DOI: 10.1016/S0167-6393(99)00002-3
关键词:
摘要: We measured the accuracy of speech recognition as a function band-pass filtering time trajectories spectral envelopes. examined (i) several types recognizers such dynamic warping (DTW) and hidden Markov model (HMM), (ii) features, filter bank output, mel-frequency cepstral coefficients (MFCC), perceptual linear predictive (PLP) coefficients. used resulting data to determine relative importance information in different modulation components for automatic recognition. concluded that: (1) most useful linguistic is frequency from range between 1 16 Hz, with dominant component at around 4 Hz; (2) some realistic environments, use below 2 Hz or above can degrade accuracy.