作者: Alexey Ozerov , Mathieu Lagrange , Emmanuel Vincent
DOI: 10.1016/J.CSL.2012.07.002
关键词: Artificial intelligence 、 Pattern recognition 、 Speaker recognition 、 Expectation–maximization algorithm 、 Mixture model 、 Maximum a posteriori estimation 、 Speech recognition 、 Acoustic model 、 Gaussian 、 Decoding methods 、 Hidden Markov model 、 Computer science
摘要: We consider the problem of acoustic modeling noisy speech data, where uncertainty over data is given by a Gaussian distribution. While this has been exploited at decoding stage via decoding, its usage training remains limited to static model adaptation. introduce new expectation maximization (EM) based technique, which we call training, that allows us train mixture models (GMMs) or hidden Markov (HMMs) directly from with dynamic uncertainty. evaluate potential technique for GMM-based speaker recognition task on corrupted real-world domestic background noise, using state-of-the-art signal enhancement and various estimation techniques as front-end. Compared conventional proposed algorithm results in 3-4% absolute improvement accuracy either matched, unmatched multi-condition data. This also applicable minor modifications maximum posteriori (MAP) likelihood linear regression (MLLR) adaptation other than audio.