A PUBLIC DOMAIN DECODER FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION

作者: Jonathan Hamaker , Neeraj Deshmukh , Aravind Ganapathiraju , Joseph Picone

DOI:

关键词:

摘要: The high cost of the infrastructure required to conduct state-of-the-art speech recognition research prevents many small groups from evaluating new ideas on large-scale tasks. To overcome this barrier, we are developing an Internet-based speechto-text (STT) toolkit. In paper, present core component system: a decoder that uses one-pass time-synchronous Viterbi-based search algorithm called trace projection. This can support efficient lattice rescoring using cross-word triphones, lexical trees and n-gram grammars. performance in terms CPU memory usage is par with commercial systems its kind. Preliminary evaluations SWITCHBOARD (SWB) corpus have yielded word error rate 39%.

参考文章(1)
J.J. Godfrey, E.C. Holliman, J. McDaniel, SWITCHBOARD: telephone speech corpus for research and development international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 517- 520 ,(1992) , 10.1109/ICASSP.1992.225858