Advances in Lecture Recognition: The ISL RT-06S Evaluation System

作者: Sebastian Stüker , Christian Fügen , Matthias Wölfel , Shajith Ikbal , Mari Ostendorf

DOI:

关键词: NISTEvaluation systemSpeech recognitionCross adaptationSet (abstract data type)Word (computer architecture)MicrophoneTranscription (software)Computer scienceLanguage model

摘要: This paper describes the 2006 lecture recognition system developed at Interactive Systems Laboratories (ISL), for individual head-microphone (IHM), single distant microphone (SDM), and multiple microphones (MDM) conditions. It was evaluated in RT-06S rich transcription meeting evaluation sponsored by US National Institute of Standards Technologies (NIST). We describe principal differences between our current those submitted previous years, namely, improved acoustic language models, cross adaptation systems with different front-ends phoneme sets, use various automatic speech segmentation algorithms. Our achieved word error rates 38.5% (53.4%) 22.9% (32.2%), respectively, on MDM IHM conditions RT-05S (RT-06S) set.

参考文章(21)
Sebastian Stüker, Mohamed Noamany, Qin Jin, Tanja Schultz, Yik-Cheung Tam, Hua Yu, Thomas Schaaf, The ISL RT04 Mandarin Broadcast News Evaluation System EARS Rich Transcription Workshop, New York, NY, 10. Nov. 2004. ,(2004)
Alex Waibe11, Hartwig Steusloff, Rainer Stiefelhagen, None, CHIL - Computers in the Human Interaction Loop. Journal of Machine Vision and Applications. pp. 18- 18 ,(2005)
Matthias Wölfel, John W. McDonough, Combining Multi-Source Far Distance Speech Recognition Strategies: Beamforming, Blind Channel and Confusion Network Combination conference of the international speech communication association. pp. 3149- 3152 ,(2005)
Christian Fügen, Matthias Wölfel, Shajith Ikbal, John W. McDonough, Multi-Source Far-Distance Microphone Selection and Combination for Automatic Transcription of Lectures conference of the international speech communication association. ,(2006)
Andreas Stolcke, Lidia Mangu, Eric Brill, Finding consensus among words : Lattice-based word error minimization conference of the international speech communication association. ,(1999)
Andreas Stolcke, SRILM – An Extensible Language Modeling Toolkit conference of the international speech communication association. ,(2002)
J. Makhoul, Linear prediction: A tutorial review Proceedings of the IEEE. ,vol. 63, pp. 561- 580 ,(1975) , 10.1109/PROC.1975.9792
M.J.F. Gales, Maximum likelihood linear transformations for HMM-based speech recognition Computer Speech & Language. ,vol. 12, pp. 75- 98 ,(1998) , 10.1006/CSLA.1998.0043
John McDonough, Thomas Schaaf, Alex Waibel, On maximum mutual information speaker-adapted training international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 601- 604 ,(2002) , 10.1109/ICASSP.2002.5743789
Ivan Bulyko, Mari Ostendorf, Andreas Stolcke, Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology companion volume of the Proceedings of HLT-NAACL 2003--short papers - NAACL '03. pp. 7- 9 ,(2003) , 10.3115/1073483.1073486