Maximum likelihood successive state splitting

作者: H. Singer , M. Ostendorf

DOI: 10.1109/ICASSP.1996.543192

关键词: Pattern recognitionContext (language use)Maximum likelihoodComputer scienceExpectation–maximization algorithmDecision treeCluster analysisNetwork topologyArtificial intelligenceHidden Markov modelGreedy algorithmDecision theory

摘要: Modeling contextual variations of phones is widely accepted as an important aspect a continuous speech recognition system, and much research has been devoted to finding robust models context for HMM systems. In particular, decision tree clustering used tie output distributions across pre-defined states, successive state splitting (SSS) define parsimonious topologies. We describe new design algorithm, called maximum likelihood (ML-SSS), that combines advantages both these approaches. Specifically, topology designed using greedy search the best temporal splits constrained EM algorithm. Japanese phone experiments, ML-SSS shows performance gains training cost reduction over SSS under several conditions.

参考文章(8)
K. Ohkura, Speaker adaptation based on transfer vector field smoothing with continuous mixture density HMMs conference of the international speech communication association. pp. 369- 372 ,(1992)
J. Takami, S. Sagayama, A successive state splitting algorithm for efficient allophone modeling international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 573- 576 ,(1992) , 10.1109/ICASSP.1992.225855
P.A. Chou, Optimal partitioning for classification and regression trees IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 13, pp. 340- 354 ,(1991) , 10.1109/34.88569
L.R. Bahl, P.V. deSouza, P.S. Gopalakrishnan, D. Nahamoo, M.A. Picheny, Decision trees for phonological rules in continuous speech [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing. pp. 185- 188 ,(1991) , 10.1109/ICASSP.1991.150308
K.-F. Lee, S. Hayamizu, H.-W. Hon, C. Huang, J. Swartz, R. Weide, Allophone clustering for continuous speech recognition international conference on acoustics, speech, and signal processing. pp. 749- 752 ,(1990) , 10.1109/ICASSP.1990.115900
T. Kosaka, S. Sagayama, Tree-structured speaker clustering for fast speaker adaptation international conference on acoustics, speech, and signal processing. pp. 245- 248 ,(1994) , 10.1109/ICASSP.1994.389309
S. J. Young, J. J. Odell, P. C. Woodland, Tree-based state tying for high accuracy acoustic modelling Proceedings of the workshop on Human Language Technology - HLT '94. pp. 307- 312 ,(1994) , 10.3115/1075812.1075885
A. Nagai, K. Yamaguchi, S. Sagayama, A. Kurematsu, ATREUS: a comparative study of continuous speech recognition systems at ATR IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 2, pp. 139- 142 ,(1993) , 10.1109/ICASSP.1993.319251