作者: Bhuvana Ramabhadran , Olivier Siohan , Geoffrey Zweig
DOI:
关键词: Audio mining 、 Stress (linguistics) 、 Word error rate 、 Computer science 、 Syllable 、 Speaker recognition 、 Artificial intelligence 、 Word recognition 、 Natural language processing 、 Acoustic model 、 Speech recognition 、 Task (project management)
摘要: This paper presents an analysis of the word recognition error rate on English subset MALACH corpus. The project is NSF-funded research program related to development multilingual access large audio archives. archive interest a collection testimonies from 52,000 survivors, liberators, rescuers and witnesses Nazi Holocaust, assembled by Shoah Visual History Foundation. data has some unique characteristics that make it quite unusual in speech community such as elderly speech, noisy conditions, heavily accented speech. Hence, challenging task for automatic (ASR). attempts identify factors affecting ASR performance task. It was found signal-to-noise ratio syllable were two dominant explaining overall rate, while we observed no evidence impact accent speaker’s age performance. Based this evidence, noise compensation experiments carried out led 1.1% absolute reduction rate.