Phoneme recognition using time-delay neural networks

作者: A. Waibel , T. Hanazawa , G. Hinton , K. Shikano , K.J. Lang

DOI: 10.1109/29.21701

关键词:

摘要: The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using three-layer arrangement of simple computing units, hierarchy can be constructed that allows for the formation arbitrary nonlinear decision surfaces, TDNN learns automatically error backpropagation; and (2) enables discover acoustic-phonetic features temporal relationships between them independently position in time therefore not blurred shifts input. As task, speaker-dependent phonemes B, D, G varying phonetic contexts was chosen. For comparison, several discrete hidden Markov models (HMM) were trained perform same task. Performance evaluation over 1946 testing tokens from three speakers showed achieves rate 98.5% correct while obtained best HMMs only 93.7%. >

参考文章(25)
David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, Learning representations by back-propagating errors Nature. ,vol. 323, pp. 696- 699 ,(1988) , 10.1038/323533A0
Peter F. Brown, The acoustic-modeling problem in automatic speech recognition Interim Report Carnegie-Mellon Univ. ,(1987) , 10.21236/ADA188529
A.-M. Derouault, Context-dependent phonetic Markov models for large vocabulary speech recognition international conference on acoustics, speech, and signal processing. ,vol. 12, pp. 360- 363 ,(1987) , 10.1109/ICASSP.1987.1169604
James K. Baker, Stochastic modeling as a means of automatic speech recognition. Carnegie Mellon University. ,(1975)
J. L. Elman, J. L. McClelland, Interactive processes in speech perception: the TRACE model Parallel distributed processing: explorations in the microstructure of cognition, vol. 2. pp. 58- 121 ,(1986)
R.L. Watrous, L. Shastri, A.H. Waibel, Learned phonetic discrimination using connectionist networks ECST. pp. 409- 412 ,(1990) , 10.1016/B978-0-08-051584-7.50039-5
D. Lubensky, Learning spectral-temporal dependencies using connectionist networks international conference on acoustics speech and signal processing. pp. 418- 421 ,(1988) , 10.1109/ICASSP.1988.196607
R. Schwartz, Y. Chow, O. Kimball, S. Roucos, M. Krasner, J. Makhoul, Context-dependent modeling for acoustic-phonetic recognition of continuous speech international conference on acoustics, speech, and signal processing. ,vol. 10, pp. 1205- 1208 ,(1985) , 10.1109/ICASSP.1985.1168283
Frederick Jelinek, Continuous speech recognition by statistical methods Proceedings of the IEEE. ,vol. 64, pp. 532- 556 ,(1976) , 10.1109/PROC.1976.10159
R.W. Prager, T.D. Harrison, F. Fallside, Boltzmann machines for speech recognition Computer Speech & Language. ,vol. 1, pp. 3- 27 ,(1986) , 10.1016/S0885-2308(86)80008-0