SPEECH RECOGNITION USING NEURAL NETWORKS

作者: Malur K. Sundareshan , Pablo Zegers

DOI:

关键词: Acoustic modelNeural gasFeature vectorTime delay neural networkArtificial intelligenceRecurrent neural networkComputer scienceSpeaker recognitionCepstrumSpeech recognitionFeature (machine learning)Pattern recognition

摘要: Although speech recognition products are already available in the market at present, their development is mainly based on statistical techniques which work under very specific assumptions. The presented this thesis investigates feasibility of alternative approaches for solving problem more efficiently. A recognizer system comprised two distinct blocks, a Feature Extractor and Recognizer, presented. block uses standard LPC Cepstrum coder, translates incoming into trajectory feature space, followed by Self Organizing Map, tailors outcome coder order to produce optimal representations words reduced dimension spaces. Designs Recognizer blocks three different compared. performance Templates, MultiLayer Perceptrons, Recurrent Neural Networks recognizers tested small isolated speaker dependent word problem. Experimental results indicate that trajectories such spaces can provide reliable spoken words, while reducing training complexity operation Recognizer. comparison between design Recognizers conducted here gives better understanding its possible solutions. new learning procedure optimizes usage set also Optimal tailoring trajectories,

参考文章(32)
Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition ,(1993)
Clifford Holmes Prator, Betty Wallace Robinett, Manual of American English Pronunciation ,(1972)
Teuvo Kohonen, Self-Organizing Maps ,(1995)
Ken-Ichi Funahashi, On the approximate realization of continuous mappings by neural networks Neural Networks. ,vol. 2, pp. 183- 192 ,(1989) , 10.1016/0893-6080(89)90003-8
M.G. Safonov, Large-scale systems modelling and control Proceedings of the IEEE. ,vol. 73, pp. 1340- 1341 ,(1985) , 10.1109/PROC.1985.13289
Steven Pinker, The Language Instinct ,(1994)
H. Hild, A. Waibel, Multi-speaker/speaker-independent architectures for the multi-state time delay neural network IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 2, pp. 255- 258 ,(1993) , 10.1109/ICASSP.1993.319284
C.R. Jankowski, H.-D.H. Vo, R.P. Lippmann, A comparison of signal processing front ends for automatic word recognition IEEE Transactions on Speech and Audio Processing. ,vol. 3, pp. 286- 293 ,(1995) , 10.1109/89.397093