Uncertainty-based learning of acoustic models from noisy data

作者: Alexey Ozerov , Mathieu Lagrange , Emmanuel Vincent

DOI: 10.1016/J.CSL.2012.07.002

关键词: Artificial intelligencePattern recognitionSpeaker recognitionExpectation–maximization algorithmMixture modelMaximum a posteriori estimationSpeech recognitionAcoustic modelGaussianDecoding methodsHidden Markov modelComputer science

摘要: We consider the problem of acoustic modeling noisy speech data, where uncertainty over data is given by a Gaussian distribution. While this has been exploited at decoding stage via decoding, its usage training remains limited to static model adaptation. introduce new expectation maximization (EM) based technique, which we call training, that allows us train mixture models (GMMs) or hidden Markov (HMMs) directly from with dynamic uncertainty. evaluate potential technique for GMM-based speaker recognition task on corrupted real-world domestic background noise, using state-of-the-art signal enhancement and various estimation techniques as front-end. Compared conventional proposed algorithm results in 3-4% absolute improvement accuracy either matched, unmatched multi-condition data. This also applicable minor modifications maximum posteriori (MAP) likelihood linear regression (MLLR) adaptation other than audio.

参考文章(38)
Alex Acero, Mike Plumpe, Li Deng, Xuedong Huang, Large-vocabulary speech recognition under adverse acoustic environments. conference of the international speech communication association. pp. 806- 809 ,(2000)
M. J. F. Gales, Model-based techniques for noise robust speech recognition Ph. D Dissertation, University of Cambridge. ,(1995)
Pejman Mowlaee, Dorothea Kolossa, Rahim Saeidi, Steffen Zeiler, Alberto Abad, Ram ´ on, Fernandez Astudillo, Rainer Martin, Silva Neto, CHiME Challenge: Approaches to Robustness using Beamforming and Uncertainty-of-Observation Techniques ,(2011)
M. J. F. Gales, Model-Based Approaches to Handling Uncertainty Robust Speech Recognition of Uncertain or Missing Data. pp. 101- 125 ,(2011) , 10.1007/978-3-642-21317-5_5
Christopher M. Bishop, Pattern Recognition and Machine Learning ,(2006)
Alexey Ozerov, Emmanuel Vincent, Frédéric Bimbot, A General Flexible Framework for the Handling of Prior Information in Audio Source Separation IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 20, pp. 1118- 1133 ,(2012) , 10.1109/TASL.2011.2172425
Simon Arberet, Alexey Ozerov, Frédéric Bimbot, Rémi Gribonval, A tractable framework for estimating and combining spectral source models for audio source separation latent variable analysis and signal separation. ,vol. 92, pp. 1886- 1901 ,(2012) , 10.1016/J.SIGPRO.2011.12.022
Climent Nadeu, Pau Pachès-Leal, Biing-Hwang Juang, Filtering the time sequences of spectral parameters for speech recognition Speech Communication. ,vol. 22, pp. 315- 332 ,(1997) , 10.1016/S0167-6393(97)00030-7
Jose G. Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V. Chawla, Francisco Herrera, A unifying view on dataset shift in classification Pattern Recognition. ,vol. 45, pp. 521- 530 ,(2012) , 10.1016/J.PATCOG.2011.06.019
Emmanuel Vincent, Shoko Araki, Fabian Theis, Guido Nolte, Pau Bofill, Hiroshi Sawada, Alexey Ozerov, Vikrham Gowreesunker, Dominik Lutter, Ngoc Q.K. Duong, The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges Signal Processing. ,vol. 92, pp. 1928- 1936 ,(2012) , 10.1016/J.SIGPRO.2011.10.007