Speech recognition of mandarin monosyllables

作者: Tze Fen Li

DOI: 10.1016/S0031-3203(03)00135-3

关键词: Pattern recognitionArtificial intelligenceSpeaker recognitionSpeech recognitionComputer scienceMandarin ChineseWaveformHidden Markov modelSyllableFeature extraction

摘要: The nonlinear dynamic characteristics of expansion and contraction the sequential time-varying features syllable pronunciations greatly complicate tasks automatic speech recognition. Each is represented by a sequence vectors linear predict coding cepstra (LPCC). Even if same speaker utters syllable, duration stable parts LPCC changes every time. Therefore, contracted such that compressed waveform has length. We propose five different simple techniques to contract vectors. A simplified Bayes decision rule with weighted variance used classify 408 speaker-dependent mandarin syllables. For syllables, recognition rate 94.36% as compared 79.78% obtained using hidden Markov models (HMM). 98.16% achieved within top 3 candidates. proposed in this paper represent syllables are easy be extracted. computation for feature extraction classification much faster than HMM or any other known techniques.

参考文章(58)
John I. Makhoul, Jared J. Wolf, Linear Prediction and the Spectral Analysis of Speech ,(1972)
S. Banner, Simulating an acoustic recognizer international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 725- 728 ,(1986) , 10.1109/ICASSP.1986.1169205
S. Soudoplatoff, Markov modeling of continuous parameters in speech recognition international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 45- 48 ,(1986) , 10.1109/ICASSP.1986.1169180
M. Elghonemy, M. Fikri, M. Hashish, E. Talkhan, Speaker independent isolated Arabic word recognition system international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 697- 700 ,(1986) , 10.1109/ICASSP.1986.1169197
M. Wagner, Wei Wang, H. Ho, M. O'Kane, Isolated-word recognition of the complete vocabulary of spoken Chinese international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 701- 704 ,(1986) , 10.1109/ICASSP.1986.1169198
L. Wilcox, B. Lowerre, Coarse classification using a hierarchical decision tree and top down parsing international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 73- 76 ,(1986) , 10.1109/ICASSP.1986.1169113
A. Aktas, B. Kammerer, W. Kupper, H. Lagger, Large-vocabulary isolated word recognition with fast coarse time alignment international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 709- 712 ,(1986) , 10.1109/ICASSP.1986.1169201
L. Bahl, P. Brown, P. de Souza, R. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 49- 52 ,(1986) , 10.1109/ICASSP.1986.1169179
S. Morishima, H. Harashima, H. Miyakawa, A proposal of a knowledge based isolated word recognition international conference on acoustics, speech, and signal processing. ,vol. 11, pp. 713- 716 ,(1986) , 10.1109/ICASSP.1986.1169202