A hierarchical decision approach to large-vocabulary discrete utterance recognition

作者: T. Kaneko , N. Dixon

DOI: 10.1109/TASSP.1983.1164211

关键词: Two stagesComputer scienceUtteranceDynamic programmingVocabularySpeech recognitionLinear predictive codingResponse timeRobustness (computer science)ComputationSignal processing

摘要: Very short response time is a critical requirement for automatic discrete utterance recognition. The real-time vocabulary size of most today's commercially available recognizers limited to several hundreds utterances, primarily due the fact that detailed acoustic matching involves considerable computation. method presented here offers an economical solution large-vocabulary recognition problem by carrying out in two stages. In initial stage, incoming linearly matched against entire using only features-utterance duration and either or three average spectra each utterance. While number prototypes large, required per match substantially reduced. During this preset best-match determined unknown input. second performed list based upon more features (e.g., 10-ms log-power spectra), elaborate methodology, e.g., dynamic programming. Evaluation experiments were conducted 2000 frequent words office-correspondence corpus normal adult-male talkers. It was observed first-stage lists 30-50 items included "correct" between 99.0 99.5 percent time. Using DP on spectral samples accuracy ranged from 86.5 94.5 percent. A match-limiter, when used with 50-64-word, recognizer makes near-real-time feasible.

参考文章(14)
Thomas B. Martin, N. Rex Dixon, Automatic Speech and Speaker Recognition John Wiley & Sons, Inc.. ,(1979)
L. Bahl, A. Cole, F. Jelinek, R. Mercer, A. Nadas, D. Nahamoo, M. Picheny, Recognition of isolated-word sentences from a 5000-word vocabulary office correspondence task international conference on acoustics, speech, and signal processing. ,vol. 8, pp. 1065- 1067 ,(1983) , 10.1109/ICASSP.1983.1172161
D. Burr, B. Ackland, N. Weste, A high speed array computer for dynamic time warping international conference on acoustics, speech, and signal processing. ,vol. 6, pp. 471- 474 ,(1981) , 10.1109/ICASSP.1981.1171152
A. Rosenberg, L. Rabiner, S. Levinson, J. Wilpon, A preliminary study on the use of demisyllables in automatic speech recognition international conference on acoustics, speech, and signal processing. ,vol. 6, pp. 967- 970 ,(1981) , 10.1109/ICASSP.1981.1171360
Y. Grenier, L. Miclet, J. Maurin, H. Michel, Speaker adaptation for phoneme recognition international conference on acoustics, speech, and signal processing. ,vol. 6, pp. 1273- 1275 ,(1981) , 10.1109/ICASSP.1981.1171364
H. Silverman, N. Dixon, A comparison of several speech-spectra classification methods IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 24, pp. 289- 295 ,(1976) , 10.1109/TASSP.1976.1162814
N. Dixon, H. Silverman, The 1976 modular acoustic processor(MAP) IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 25, pp. 367- 379 ,(1977) , 10.1109/TASSP.1977.1162985
H. Silverman, N. Dixon, A parametrically controlled spectral analysis system for speech IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 22, pp. 362- 381 ,(1974) , 10.1109/TASSP.1974.1162599
H. Silverman, An introduction to programming the Winograd Fourier transform algorithm (WFTA) IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 25, pp. 152- 165 ,(1977) , 10.1109/TASSP.1977.1162924
L. Rabiner, J. Wilpon, Isolated word recognition using a two-pass pattern recognition approach ICASSP '81. IEEE International Conference on Acoustics, Speech, and Signal Processing. ,vol. 6, pp. 724- 727 ,(1981) , 10.1109/ICASSP.1981.1171228