Variable frame rate hierarchical analysis for robust speech recognition

作者: Jean Rouat , Stephane Loiselle , Stephane Molotchnikoff

DOI: 10.1109/IROS.2011.6094672

关键词: Speaker recognitionWord recognitionPattern recognitionVoice activity detectionSpeech processingSignal-to-noise ratioVariable frame rateFeature extractionSpeech recognitionMel-frequency cepstrumHidden Markov modelArtificial intelligenceComputer science

摘要: A new bio-inspired speech analysis system that extracts acoustical events is proposed and used in the design of a variable frame rate (VFR) recognizer. The same recognizer (Hidden Markov Model -HMM- Mel Frequency Cepstrum Coefficients -MFCC-) has been with VFR conventional fixed (FFR) approach. In comparison other recognizers, hierarchical features have potential to serve as classification parameters complete recognition system. Also, no voice activity detection required there are hard decisions be taken by Events label identify moments at which properties stable or changing. These markers on an window can positioned perform recognition. Inspired our knowledge auditory visual systems, complex like transients energy orientation used. Training done clean noisy (from 20dB −10dB Signal Noise Ratios -SNR) reverberated using TI 46-word database corrupted 4 noises from Aurora 2 data. FFR recognizer, yields more than 50% increase rates for speaker independent isolated word task when SNRs between 0 20 dB.

参考文章(33)
Jeffery A. Winer, Christoph E. Schreiner, The Central Auditory System: A Functional Analysis Springer, New York, NY. pp. 1- 68 ,(2005) , 10.1007/0-387-27083-3_1
Dan Jurafsky, James H. Martin, Speech and Language Processing ,(1999)
Martin Cooke, Phil D. Green, Jon Barker, Robust ASR Based On Clean Speech Models: An Evaluation of Missing Data Techniques For Connected Digit Recognition in Noise conference of the international speech communication association. pp. 213- 217 ,(2001)
David Verstraeten, Benjamin Schrauwen, Dirk Stroobandt, Isolated word recognition using a Liquid State Machine the european symposium on artificial neural networks. pp. 435- 440 ,(2005)
Stéphane Loiselle, Jean Rouat, Daniel Pressnitzer, Simon Thorpe, None, Exploration of rank order coding with spiking neural networks for speech recognition international joint conference on neural network. ,vol. 4, pp. 2076- 2080 ,(2005) , 10.1109/IJCNN.2005.1556220
Arfan Ghani, T. Martin McGinnity, Liam P. Maguire, Jim Harkin, Neuro-inspired Speech Recognition with Recurrent Spiking Neurons international conference on artificial neural networks. pp. 513- 522 ,(2008) , 10.1007/978-3-540-87536-9_53
Jean Rouat, Stéphane Loiselle, Ramin Pichevar, Towards neurocomputational speech and sound processing Progress in nonlinear speech processing. pp. 58- 77 ,(2007) , 10.1007/978-3-540-71505-4_4
Whitlow W. L. Au, The Mammalian auditory pathway : neurophysiology Journal of the Acoustical Society of America. ,vol. 95, pp. 1697- 1698 ,(1992) , 10.1121/1.408521
Gregory Hickok, David Poeppel, The cortical organization of speech processing Nature Reviews Neuroscience. ,vol. 8, pp. 393- 402 ,(2007) , 10.1038/NRN2113