Simplified VTS-based I-vector extraction in noise-robust speaker recognition

作者: Yun Lei , Mitchell McLaren , Luciana Ferrer , Nicolas Scheffer

DOI: 10.1109/ICASSP.2014.6854360

关键词: I vectorNoise (video)Normalization (statistics)Computer scienceSpeaker recognitionPattern recognitionContrast (statistics)Speech recognitionScale (ratio)NISTScheme (programming language)Artificial intelligence

摘要: A vector taylor series (VTS) based i-vector extractor was recently proposed for noise-robust speaker recognition by extracting synthesized clean i-vectors to be used in the standard system back-end. This approach brings significant improvements accuracy noisy speech conditions. However, this incurred such a large computational expense that using state-of-the-art model size or evaluating scale evaluations impractical. In work, we propose an efficient simplification scheme, named sVTS, order show VTS gives applications compared systems. contrast VTS, sVTS generates normalized Baum-Welch statistics and uses model, making it straightforward employ on system. Results presented both PRISM NIST SRE'12 corpora provides conditions, our result only slight degradation with respect original approach.

参考文章(10)
Luciana Ferrer, Mitchell McLaren, Nicolas Scheffer, Yun Lei, Martin Graciarena, Vikramjit Mitra, A Noise-Robust System for NIST 2012 Speaker Recognition Evaluation conference of the international speech communication association. pp. 1981- 1985 ,(2013) , 10.21236/ADA614010
Trausti T. Kristjansson, Alex Acero, Li Deng, Jerry Zhang, HMM adaptation using vector taylor series for noisy speech recognition. conference of the international speech communication association. pp. 869- 872 ,(2000)
Yun Lei, Lukas Burget, Nicolas Scheffer, A noise robust i-vector extractor using vector taylor series for speaker recognition international conference on acoustics, speech, and signal processing. pp. 6788- 6791 ,(2013) , 10.1109/ICASSP.2013.6638976
O Kalinli, M L Seltzer, J Droppo, A Acero, Noise Adaptive Training for Robust Automatic Speech Recognition IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 18, pp. 1889- 1901 ,(2010) , 10.1109/TASL.2010.2040522
Simon J.D. Prince, James H. Elder, Probabilistic Linear Discriminant Analysis for Inferences About Identity international conference on computer vision. pp. 1- 8 ,(2007) , 10.1109/ICCV.2007.4409052
Yun Lei, Lukas Burget, Luciana Ferrer, Martin Graciarena, Nicolas Scheffer, Towards noise-robust speaker recognition using probabilistic linear discriminant analysis 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 4253- 4256 ,(2012) , 10.1109/ICASSP.2012.6288858
P. Kenny, P. Ouellet, N. Dehak, V. Gupta, P. Dumouchel, A Study of Interspeaker Variability in Speaker Verification IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 16, pp. 980- 988 ,(2008) , 10.1109/TASL.2008.925147
Pierre Ouellet, Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Dumouchel, Front-End Factor Analysis for Speaker Verification IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 19, pp. 788- 798 ,(2011) , 10.1109/TASL.2010.2064307
Carol Y. Espy-Wilson, Daniel Garcia-Romero, Analysis of i-vector Length Normalization in Speaker Recognition Systems. conference of the international speech communication association. pp. 249- 252 ,(2011)
Patrick Kenny, Bayesian Speaker Verification with Heavy-Tailed Priors. Odyssey. pp. 14- ,(2010)