Filtering the time sequences of spectral parameters for speech recognition

作者: Climent Nadeu , Pau Pachès-Leal , Biing-Hwang Juang

DOI: 10.1016/S0167-6393(97)00030-7

关键词:

摘要: Abstract In automatic speech recognition, the signal is usually represented by a set of time sequences spectral parameters (TSSPs) that model temporal evolution envelope frame-to-frame. Those are then filtered either to make them more robust environmental conditions or compute differential (dynamic features) which enhance discrimination. this paper, we apply frequency analysis TSSPs in order provide an interpretation framework for various types parameter filters used so far. Thus, average long-term spectrum successfully reveals combined effect equalization and band selection provides insights into TSSP filtering. Also, show paper that, when supplementary not used, recognition rate can be improved even clean speech, just properly filtering TSSPs. To support claim, number experimental results presented, both using whole-word subword based models. The empirically optimum attenuate low-pass emphasize higher peak output these lies at around syllable employed database (≈3 Hz).

参考文章(27)
Antonio Bonafonte, Eugenio Vives, Rafael Estany, Study of subword units for spanish speech recognition conference of the international speech communication association. pp. 1607- 1610 ,(1995)
T. Arai, M. Pavel, H. Hermansky, C. Avendano, Intelligibility of speech with filtered time trajectories of spectral envelopes international conference on spoken language processing. ,vol. 4, pp. 2490- 2493 ,(1996) , 10.1109/ICSLP.1996.607318
Climent Nadeu, Mónica Gorricho, Javier Hernando, On the decorrelation of filter-bank energies in speech recognition conference of the international speech communication association. pp. 1381- 1384 ,(1995)
Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition ,(1993)
C. Nadeu, J.B. Marino, J. Hernando, A. Nogueiras, Frequency and time filtering of filter-bank energies for HMM speech recognition international conference on spoken language processing. ,vol. 1, pp. 430- 433 ,(1996) , 10.1109/ICSLP.1996.607146
Alan V. Oppenheim, Ronald W. Schafer, Discrete-Time Signal Processing ,(1989)
P. Paches-Leal, C. Nadeu, On parameter filtering in continuous subword-unit-based speech recognition international conference on spoken language processing. ,vol. 2, pp. 1065- 1068 ,(1996) , 10.1109/ICSLP.1996.607789
Brian A. Hanson, Ted H. Applebaum, Jean-Claude Junqua, Spectral Dynamics for Speech Recognition Under Adverse Conditions Springer, Boston, MA. pp. 331- 356 ,(1996) , 10.1007/978-1-4613-1367-0_14
S. Furui, Speaker-independent isolated word recognition using dynamic features of speech spectrum IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 34, pp. 52- 59 ,(1986) , 10.1109/TASSP.1986.1164788
Rob Drullman, Joost M. Festen, Reinier Plomp, Effect of temporal envelope smearing on speech reception The Journal of the Acoustical Society of America. ,vol. 95, pp. 1053- 1064 ,(1994) , 10.1121/1.408467