ATREUS: a comparative study of continuous speech recognition systems at ATR

作者: A. Nagai , K. Yamaguchi , S. Sagayama , A. Kurematsu

DOI: 10.1109/ICASSP.1993.319251

关键词: Hidden Markov modelArtificial neural networkComputer scienceFuzzy logicCepstrumVector quantizationPattern recognitionSpeech recognitionPhraseArtificial intelligenceCodebookSpeech synthesisPhone

摘要: The authors describe ATREUS, an aggregation of a large variety continuous speech recognition systems, forming the spoken input front-end interpreting telephony system. ATREUS includes following phone models: discrete HMMs (hidden Markov models) with fuzzy vector quantization (VQ) and multiple codebooks; mixture density HMMs; hidden networks derived from SSS (successive state splitting) algorithm; time-delay-neural networks; partition models. Its speaker modes involve speaker-dependent, speaker-independent, speaker-adaptive techniques such as codebook mapping for VQ-HMMs, field smoothing all types HMMs, neural network mapping. A comparative study is given viewpoints structure, constituent techniques, hardware implementation, performance. was evaluated Japanese phrase recognition. combination called ATREUS/SSS-LR had best performance among systems. >

参考文章(19)
Shigeki Sagayama, Kenji Kita, Frank K. Soong, Kouichi Yamaguchi, Continuous mixture HMM-LR using the a* algorithm for continuous speech recognition. conference of the international speech communication association. ,(1992)
K. Ohkura, Speaker adaptation based on transfer vector field smoothing with continuous mixture density HMMs conference of the international speech communication association. pp. 369- 372 ,(1992)
Shigeki Sagayama, Hiroaki Hattori, Vector field smoothing principle for speaker adaptation. conference of the international speech communication association. ,(1992)
Hidefumi Sawai, The TDNN-LR large-vocabulary and continuous speech recognition system. conference of the international speech communication association. ,(1990)
Shigeki Sagayama, Toshiyuki Hanazawa, Akira Kurematsu, Kenji Kita, Tadashi Suzuki, Kiyohiro Shikano, Tsuyoshi Kawabata, Tomohiro Iwasaki, Tsuyoshi Morimoto, Akito Nagai, Kunio Nakajima, Hardware implementation of realtime 1000-word HMM-LR continuous speech recognition. conference of the international speech communication association. ,(1992)
Shigeki Sagayama, Jun-ichi Takami, Akito Nagai, The SSS-LR continuous speech recognition system: integrating SSS-derived allophone models and a phoneme-context-dependent LR parser. conference of the international speech communication association. ,(1992)
Yoshinaga Kato, Masahide Sugiyama, Keiji Fukuzawa, A fuzzy partition model (FPM) neural network architecture for speaker-independent continuous speech recognition. conference of the international speech communication association. ,(1992)
J. Takami, A. Kai, S. Sagayama, Speech recognition by combining pairwise discriminant time-delay neural networks and predictive LR-parser Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop. pp. 327- 336 ,(1991) , 10.1109/NNSP.1991.239509
T. Hanazawa, K. Kita, S. Nakamura, T. Kawabata, K. Shikano, ATR HMM-LR continuous speech recognition system international conference on acoustics, speech, and signal processing. pp. 611- 614 ,(1990) , 10.1109/ICASSP.1990.115535
J. Takami, S. Sagayama, A successive state splitting algorithm for efficient allophone modeling international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 573- 576 ,(1992) , 10.1109/ICASSP.1992.225855