Variational inference and learning for segmental switching state space models of hidden speech dynamics

作者: L.J. Lee , H. Attias , Li Deng

DOI: 10.1109/ICASSP.2003.1198920

关键词: Speech enhancementMachine learningNatural languageComputer scienceInferenceBayesian networkSpeech processingArtificial intelligenceSpeech productionHidden Markov modelState space

摘要: This paper describes novel and powerful variational EM algorithms for the segmental switching state space models used in speech applications, which are capable of capturing key internal (or hidden) dynamics natural production. Hidden dynamic (HDMs) have recently become a class promising acoustic to incorporate crucial speech-specific knowledge overcome many inherent weaknesses traditional HMMs. However, lack efficient statistical learning is one main obstacles preventing them from being well studied widely used. Since exact inference intractable, approach taken develop effective approximate algorithms. We implemented constraint modeling present recovering hidden discrete units data only. The effectiveness developed verified by experiments on simulation Switchboard data.

参考文章(20)
Raimo Bakis, Jing Huang, Bing Xiang, Yuqing Gao, Multistage coarticulation model combining articulatory, formant and cepstral features. conference of the international speech communication association. pp. 25- 28 ,(2000)
Michael Tipping, Relevance vector machine ,(2000)
John E. Hogden, Speech processing using maximum likelihood continuity mapping ASAJ. ,vol. 108, pp. 2709- ,(1998)
Phillippe Jeanrenaud, Kenney Ng, Herbert Gish, Jan R. Rohlicek, John W. McDonough, Topic discriminator using posterior probability or confidence scores ,(1994)
K. Reinhard, M. Niranjan, Diphone subspace mixture trajectory models for HMM complementation Speech Communication. ,vol. 38, pp. 237- 265 ,(2002) , 10.1016/S0167-6393(01)00054-1
Jeff Z. Ma, Li Deng, A mixture linear model with target-directed dynamics for spontaneous speech recognition IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 1, pp. 961- 964 ,(2002) , 10.1109/ICASSP.2002.5743953
Hsaio-Wuen Hon, Kuansan Wang, Speech recognition method and apparatus utilizing multi-unit models Journal of the Acoustical Society of America. ,vol. 115, pp. 959- 959 ,(2000) , 10.1121/1.1697777