Evaluation of spoken language recognition technology using broadcast speech: performance and challenges.

作者: Mikel Peñagarikano , Luis Javier Rodríguez-Fuentes , Germán Bordel , Mireia Díez , Amparo Varona

DOI:

关键词:

摘要: Spoken Language Recognition (SLR) technology has remarkably improved in the last years, partly thanks to NIST Evaluations (LRE), which have become standard benchmarks for testing new approaches. evaluations focus on narrow-band conversational telephone speech and deal with some specific target languages. Recent efforts expand scope of SLR assessment include Albayzin 2008 2010 LRE, wide-band TV broadcast speech. In this work, a system based state-of-the-art approaches is developed evaluated LRE datasets, looking identify those conditions that make task challenging eventually guide design future using same kind data. We present analyse performance under different conditions, regarding: (1) set languages (including details about confusion each other) amount data available estimate models; (3) presence background noise.

参考文章(18)
Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, German Bordel, Mireia Diez, Amparo Varona, University of the Basque Country (EHU) Systems for the 2011 NIST Language Recognition Evaluation ,(2011)
Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel, Mireia Díez, Amparo Varona, KALAKA: A TV Broadcast Speech Database for the Evaluation of Language Recognition Systems. language resources and evaluation. ,(2010)
Valiantsina Hubeika, Albert Strasheim, Niko Brümmer, Ondrej Glembek, Lukás Burget, Pavel Matejka, Discriminative Acoustic Language Recognition via Channel-Compensated GMM Statistics conference of the international speech communication association. pp. 2187- 2190 ,(2009)
Mark A. Przybocki, Alvin F. Martin, NIST 2003 Language Recognition Evaluation conference of the international speech communication association. ,(2003)
Mark Ordowski, Mark A. Przybocki, Alvin F. Martin, George R. Doddington, Terri Kamm, The DET Curve in Assessment of Detection Task Performance conference of the international speech communication association. ,(1997)
Andreas Stolcke, SRILM – An Extensible Language Modeling Toolkit conference of the international speech communication association. ,(2002)
Mikel Peñagarikano, Luis Javier Rodríguez, Germán Bordel, Mireia Díez, Amparo Varona, The Albayzin 2010 Language Recognition Evaluation conference of the international speech communication association. pp. 1529- 1532 ,(2011)
Luis Javier Rodriguez-Fuentes, Mikel Penagarikano, Amparo Varona, Mireia Diez, German Bordel, David Martinez, Jesus Villalba, Antonio Miguel, Alfonso Ortega, Eduardo Lleida, Alberto Abad, Oscar Koller, Isabel Trancoso, Paula Lopez-Otero, Laura Docio-Fernandez, Carmen Garcia-Mateo, Rahim Saeidi, Mehdi Soufifar, Tomi Kinnunen, Torbjorn Svendsen, Pasi Franti, Multi-site heterogeneous system fusions for the Albayzin 2010 Language Recognition Evaluation ieee automatic speech recognition and understanding workshop. pp. 377- 382 ,(2011) , 10.1109/ASRU.2011.6163961
Niko Brummer, David Van Leeuwen, On calibration of language recognition scores 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop. pp. 1- 8 ,(2006) , 10.1109/ODYSSEY.2006.248106
F. S. Richardson, W. M. Campbell, Language recognition with discriminative keyword selection international conference on acoustics, speech, and signal processing. pp. 4145- 4148 ,(2008) , 10.1109/ICASSP.2008.4518567