A comparative study of continuous speech recognition using neural networks and hidden Markov models

作者: S. Renals , D. McKelvie , F. McInnes

DOI: 10.1109/ICASSP.1991.150353

关键词:

摘要: The recognition performances of two front ends are compared for continuous speech tasks. First, a neural network model (NNM) end was used, with frame labeling performed by radial basis function and segmentation Viterbi algorithm. second discrete hidden Markov (HMM), featuring explicit state duration probability distributions. Two experiments were performed. first used speaker-dependent database, lexicon 571 words. Using low-perplexity grammar, the NNM produced word accuracy 94% sentence 86%. This slightly inferior to HMM end, which accuracies 96% 88%. Without 58% 49% (HMM) recorded. set MIT portion TIMIT database (415 speakers 2072 sentences in total). Results poor both ends, producing marginally better results. >

参考文章(10)
Fergus R. McInnes, Alan Wrench, Yasuo Ariki, Enhancement and optimisation of a speech recognition front end based on hidden Markov models. conference of the international speech communication association. pp. 2461- 2464 ,(1989)
David Lowe, David S. Broomhead, Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks Complex Systems. ,vol. 2, pp. 321- 355 ,(1988)
L. Bahl, P. Brown, P. de Souza, R. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 49- 52 ,(1986) , 10.1109/ICASSP.1986.1169179
S. Renals, R. Rohwer, Learning phoneme recognition using neural networks international conference on acoustics, speech, and signal processing. pp. 413- 416 ,(1989) , 10.1109/ICASSP.1989.266453
F. Fallside, H. Lucke, T.P. Marsland, P.J. O'Shea, M.S.J. Owen, R.W. Prager, A.J. Robinson, N.H. Russell, Continuous speech recognition for the TIMIT database using neural networks International Conference on Acoustics, Speech, and Signal Processing. pp. 445- 448 ,(1990) , 10.1109/ICASSP.1990.115745
H. Bourlard, C.J. Wellekens, Speech pattern discrimination and multilayer perceptrons Computer Speech & Language. ,vol. 3, pp. 1- 19 ,(1989) , 10.1016/0885-2308(89)90011-9
William M. Fisher, Victor Zue, Jared Bernstein, David S. Pallett, An acoustic‐phonetic data base The Journal of the Acoustical Society of America. ,vol. 81, pp. S92- S93 ,(1987) , 10.1121/1.2034854
T. Kohonen, The 'neural' phonetic typewriter Computer. ,vol. 21, pp. 11- 22 ,(1988) , 10.1109/2.28