Speech recognition error analysis on the English MALACH corpus.

作者: Bhuvana Ramabhadran , Olivier Siohan , Geoffrey Zweig

DOI:

关键词: Audio miningStress (linguistics)Word error rateComputer scienceSyllableSpeaker recognitionArtificial intelligenceWord recognitionNatural language processingAcoustic modelSpeech recognitionTask (project management)

摘要: This paper presents an analysis of the word recognition error rate on English subset MALACH corpus. The project is NSF-funded research program related to development multilingual access large audio archives. archive interest a collection testimonies from 52,000 survivors, liberators, rescuers and witnesses Nazi Holocaust, assembled by Shoah Visual History Foundation. data has some unique characteristics that make it quite unusual in speech community such as elderly speech, noisy conditions, heavily accented speech. Hence, challenging task for automatic (ASR). attempts identify factors affecting ASR performance task. It was found signal-to-noise ratio syllable were two dominant explaining overall rate, while we observed no evidence impact accent speaker’s age performance. Based this evidence, noise compensation experiments carried out led 1.1% absolute reduction rate.

参考文章(13)
Mohamed Afify, An accurate noise compensation algorithm in the log-spectral domain for robust speech recognition. conference of the international speech communication association. ,(2003)
T. Zeppenfeld, E. Shriberg, M. Ostendorf, M. Finke, S. Roweis, A. Waibel, A. Gunawardana, K. Ross, M. Bacchiani, B. Wheatley, D. Talkin, B. Byrne, Modeling Systematic Variations in Pronunciation via a Language-Dependent Hidden Speaking Mode ,(1999)
Karthik Visweswariah, Brian Kingsbury, Vaibhava Goel, Peder A. Olsen, Ramesh Gopinath, Jing Huang, Large vocabulary conversational speech recognition with the extended maximum likelihood linear transformation (EMLLT) model. conference of the international speech communication association. ,(2002)
T.A. Myrvoll, S. Nakamura, Optimal filtering of noisy cepstral coefficients for robust ASR ieee automatic speech recognition and understanding workshop. pp. 381- 386 ,(2003) , 10.1109/ASRU.2003.1318471
Steven L. Salzberg, Alberto Segre, Programs for Machine Learning ,(1994)
Miichi Yamada, Shinichi Yoshizawa, Akinobu Lee, Kiyohiro Shikano, Akira Baba, Elderly Acoustic Model for Large Vocabulary Continuous Speech Recognition conference of the international speech communication association. pp. 1657- 1660 ,(2001)
Nobuaki Minematsu, Mariko Sekiguchi, Keikichi Hirose, Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 137- 140 ,(2002) , 10.1109/ICASSP.2002.5743673
J.G. Wilpon, C.N. Jacobsen, A study of speech recognition for children and the elderly international conference on acoustics speech and signal processing. ,vol. 1, pp. 349- 352 ,(1996) , 10.1109/ICASSP.1996.541104
J. Ross Quinlan, C4.5: Programs for Machine Learning ,(1992)
P.J. Moreno, B. Raj, R.M. Stern, A vector Taylor series approach for environment-independent speech recognition international conference on acoustics speech and signal processing. ,vol. 2, pp. 733- 736 ,(1996) , 10.1109/ICASSP.1996.543225