作者: Louis D. Braida , Paul Duchnowski
DOI:
关键词:
摘要: Speech is a wideband signal with cues identifying particular element distributed across frequency. To capture these cues, most ASR systems analyze the speech into spectral (or spectrally-derived) components prior to recognition. Traditionally, are integrated frequency form vector of "acoustic evidence" on which decision by system based. This thesis develops an alternate approach, post-labeling integration. In this scheme, tentative decisions or labels, identity given assigned in parallel sub-recognizers, each operating band-limited portion waveform. Outputs independent channels subsequently combined (integrated) render final decision. Remarkably good recognition bandlimited nonsense syllables humans leads consideration method. It also allows potentially more accurate parameterization waveform and simultaneously robust estimation parameter probabilities. The algorithm represents attempt make explicit use redundancies speech. Three basic methods parameterizing input sub-recognizers were considered, focusing respectively LPC cepstrum coefficients, parameters based autocorrelation function. Four implemented as discrete Hidden Markov Model (HMM) systems. Maximum A Posteriori (MAP) hypothesis testing approach was applied problem integrating individual sub-recognizer frame basis. Final segmentation achieved secondary HMM. Five estimating probabilities necessary for MAP integration tested. The proposed structure task phonetic, speaker-independent, continuous Performance several combinations schemes measured. best score 58.5% 39 phone alphabet roughly comparable published performance traditional HMM warrants further development. Potential sources weakness implemented, identified improvements suggested. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)