Four-level tied-structure for efficient representation of acoustic modeling

作者: S. Takahashi , S. Sagayama

DOI: 10.1109/ICASSP.1995.479643

关键词: Estimation theoryRepresentation (mathematics)Artificial intelligenceHidden Markov modelRobustness (computer science)Gaussian processTraining setDimension (vector space)Pattern recognitionContext modelMultivariate normal distributionComputer scienceWord recognition

摘要: One of the problems with context-dependent HMMs is that a large number model parameters should be estimated using limited amount training data. Parameters have same property tied in order to represent acoustic models efficiently. This paper proposes four-level tied-structure for phoneme models. The four levels include 1) level, 2) state 3) distribution and 4) feature parameter level. Although some techniques been proposed first three levels, tying fourth level newly this paper. We found makes it possible 1,600 mean vectors multivariate Gaussian mixture by combination 16 representative values each dimension. Experimental results show reduces calculation required recognition without significant degrading performance. Furthermore, we also effective training.

参考文章(8)
Philip C. Woodland, Steve J. Young, The use of state tying in continuous speech recognition. conference of the international speech communication association. ,(1993)
S. Sagayama, Phoneme environment clustering for speech recognition international conference on acoustics, speech, and signal processing. pp. 397- 400 ,(1989) , 10.1109/ICASSP.1989.266449
J. Takami, S. Sagayama, A successive state splitting algorithm for efficient allophone modeling international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 573- 576 ,(1992) , 10.1109/ICASSP.1992.225855
X.D. Huang, Phoneme classification using semicontinuous hidden Markov models IEEE Transactions on Signal Processing. ,vol. 40, pp. 1062- 1067 ,(1992) , 10.1109/78.134469
T. Kosaka, J. Takami, S. Sagayama, Rapid speaker adaptation using speaker-mixture allophone models applied to speaker-independent speech recognition IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 2, pp. 570- 573 ,(1993) , 10.1109/ICASSP.1993.319371
Kay-Fu Lee, Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 38, pp. 347- 366 ,(1990) , 10.1007/978-3-642-76626-8_15
Mei-Yuh Hwang, Xuedong Huang, Shared-distribution hidden Markov models for speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 1, pp. 414- 420 ,(1993) , 10.1109/89.242487
D.B. Paul, The Lincoln tied-mixture HMM continuous speech recognizer international conference on acoustics, speech, and signal processing. pp. 329- 332 ,(1991) , 10.1109/ICASSP.1991.150343