作者: T. Kaneko , N. Dixon
DOI: 10.1109/TASSP.1983.1164211
关键词: Two stages 、 Computer science 、 Utterance 、 Dynamic programming 、 Vocabulary 、 Speech recognition 、 Linear predictive coding 、 Response time 、 Robustness (computer science) 、 Computation 、 Signal processing
摘要: Very short response time is a critical requirement for automatic discrete utterance recognition. The real-time vocabulary size of most today's commercially available recognizers limited to several hundreds utterances, primarily due the fact that detailed acoustic matching involves considerable computation. method presented here offers an economical solution large-vocabulary recognition problem by carrying out in two stages. In initial stage, incoming linearly matched against entire using only features-utterance duration and either or three average spectra each utterance. While number prototypes large, required per match substantially reduced. During this preset best-match determined unknown input. second performed list based upon more features (e.g., 10-ms log-power spectra), elaborate methodology, e.g., dynamic programming. Evaluation experiments were conducted 2000 frequent words office-correspondence corpus normal adult-male talkers. It was observed first-stage lists 30-50 items included "correct" between 99.0 99.5 percent time. Using DP on spectral samples accuracy ranged from 86.5 94.5 percent. A match-limiter, when used with 50-64-word, recognizer makes near-real-time feasible.