On the generalization capability of multi-layered networks in the extraction of speech properties

作者: Yoshua Bengio , Renato De Mori , Piero Cosi

DOI:

关键词:

摘要: The paper describes a speech coding system based on an ear model followed by set of MultiLayer Networks (MLN). MLNs are trained to learn how recognize articulatory features like the place and manner articulation. Experiments performed 10 English vowels showing recognition rate higher than 95% for new speakers. When used recognition, comparable results obtained diphthongs not training pronounced This suggests that suitably fed data computed have good generalization capabilities over speakers sounds.

参考文章(8)
Stephanie Seneff, A joint synchrony/mean-rate model of auditory speech processing Journal of Phonetics. ,vol. 16, pp. 101- 111 ,(1990) , 10.1016/S0095-4470(19)30466-8
Lokendra Shastri, Raymond L. Watrous, Learning phonetic features using connectionist networks international joint conference on artificial intelligence. pp. 851- 854 ,(1987)
Renato De Mori, Pietro Laface, Yu Mong, Parallel Algorithms for Syllable Recognition in Continuous Speech IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. PAMI-7, pp. 56- 69 ,(1985) , 10.1109/TPAMI.1985.4767618
David C. Plaut, Geoffrey E. Hinton, Learning sets of filters using back-propagation Computer Speech & Language. ,vol. 2, pp. 35- 61 ,(1987) , 10.1016/0885-2308(87)90026-X
Y. Bengio, R. Cardin, R. De Mori, E. Merlo, Programmable execution of multi-layered networks for automatic speech recognition Communications of the ACM. ,vol. 32, pp. 195- 199 ,(1989) , 10.1145/63342.63345
H.C. Leung, V.W. Zue, Some phonetic recognition experiments using artificial neural nets international conference on acoustics speech and signal processing. pp. 422- 425 ,(1988) , 10.1109/ICASSP.1988.196608
D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning internal representations by error propagation Parallel distributed processing: explorations in the microstructure of cognition, vol. 1. ,vol. 1, pp. 318- 362 ,(1986)
Geoffrey E. Hinton, Toshiyuki Hanazawa, Alex Waibel, Kiyohiro Shikano, Kevin J. Lang, Phoneme recognition: neural networks vs. hidden Markov models. international conference on acoustics, speech, and signal processing. pp. 107- 110 ,(1988)