Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006

作者: Niko Brummer , Lukas Burget , Jan Cernocky , Ondrej Glembek , Frantisek Grezl

DOI: 10.1109/TASL.2007.902870

关键词:

摘要: This paper describes and discusses the "STBU" speaker recognition system, which performed well in NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), University Stellenbosch Africa). system was combination three main kinds subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features, 2) Gaussian mixture model-support vector machine (GMM-SVM), using GMM mean supervectors as input to an SVM, 3) maximum-likelihood regression-support (MLLR-SVM), MLLR adaptation coefficients derived from English large vocabulary continuous speech (LVCSR) system. All subsystems made use supervector subspace channel compensation methods-either eigenchannel nuisance attribute projection. We document design performance all subsystems, their fusion calibration via logistic regression. Finally, we also present cross-site that done several additional systems other SRE-2006 participants.

参考文章(35)
Nikki Mirghafori, Larry P. Heck, An adaptive speaker verification system with speaker dependent a priori decision thresholds. conference of the international speech communication association. ,(2002)
Andreas Stolcke, Elizabeth Shriberg, Luciana Ferrer, Anand Venkataraman, Sachin S. Kajarekar, MLLR transforms as features in speaker recognition. conference of the international speech communication association. pp. 2425- 2428 ,(2005)
Stanley Lemeshow, David W. Hosmer, Applied Logistic Regression ,(1989)
Shou-chun Yin, Patrick Kenny, Richard Rose, Experiments in Speaker Adaptation for Factor Analysis Based Speaker Verification 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop. pp. 1- 6 ,(2006) , 10.1109/ODYSSEY.2006.248130
Stéphane Pigeon, Pascal Druyts, Patrick Verlinde, Applying Logistic Regression to the Fusion of the NIST'99 1-Speaker Submissions Digital Signal Processing. ,vol. 10, pp. 237- 248 ,(2000) , 10.1006/DSPR.1999.0358
Eric Hansen, Raymond Slyh, Timothy Anderson, Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop. pp. 1- 8 ,(2006) , 10.1109/ODYSSEY.2006.248122
Niko Brümmer, Johan du Preez, Application-independent evaluation of speaker detection Computer Speech & Language. ,vol. 20, pp. 230- 275 ,(2006) , 10.1016/J.CSL.2005.08.001
Douglas A. Reynolds, Thomas F. Quatieri, Robert B. Dunn, Speaker Verification Using Adapted Gaussian Mixture Models Digital Signal Processing. ,vol. 10, pp. 19- 41 ,(2000) , 10.1006/DSPR.1999.0361
Mark Przybocki, Alvin Martin, Audrey Le, NIST Speaker Recognition Evaluation Chronicles - Part 2 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop. pp. 1- 6 ,(2006) , 10.1109/ODYSSEY.2006.248120
Pavel Matejka, Lukás Burget, Petr Schwarz, Ondrej Glembek, Martin Karafiat, Frantisek Grezl, J Cernocky, David A van Leeuwen, Niko Brummer, Albert Strasheim, STBU System for the NIST 2006 Speaker Recognition Evaluation international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 221- 224 ,(2007) , 10.1109/ICASSP.2007.367203