Design and evaluation of acoustic and language models for large scale telephone services

作者: A. Facco , D. Falavigna , R. Gretter , M. Viganò

DOI: 10.1016/J.SPECOM.2005.07.004

关键词:

摘要: Abstract This paper describes the specification, design and development phases of two widely used telephone services based on automatic speech recognition. The effort spent for evaluating tuning these will be discussed in detail. In developing first service, mainly recognition “alphanumeric” sequences, a significant part work consisted refining acoustic models. To increase accuracy we adopted algorithms methods consolidated past over broadcast news transcription tasks. A result shows that use task specific context dependent phone models reduces word error rate by about 40% relative to using independent Note latter was achieved small vocabulary task, significantly different from those generally transcription. We also investigated both unsupervised supervised training procedures. Moreover, studied novel partly technique allows us select some “optimal” way material manually transcribe model training. proposed procedure gives performance close obtained with completely method. second phrase spotting, wide devoted language refinement. particular, several types rejection networks were detect out words given task; major demonstrates class trigram 36.7% 11.1% respect loop network. For benefits related costs brought regular grammars, stochastic mixed reported discussed. Finally, notice most experiments described this carried field databases collected through developed services.

参考文章(32)
Teresa M. Kamm, Gerard G. L. Meyer, Robustness aspects of active learning for acoustic modeling. conference of the international speech communication association. ,(2004)
Daniele Falavigna, Marco Orlandi, Alfiero Santarelli, Maximum likelihood endpoint detection with time-domain features. conference of the international speech communication association. ,(2003)
Daniele Falavigna, Roberto Gretter, Andrea Facco, Marcello Viganò, On the development of telephone applications: some practical issues and evaluation. conference of the international speech communication association. ,(2004)
Daniele Falavigna, Roberto Gretter, Marco Orlandi, A mixed language model for a dialogue system over ihe telephone. conference of the international speech communication association. pp. 585- 588 ,(2000)
Daniele Falavigna, Roberto Gretter, On field experiments of continuous digit recognition over the telephone network. conference of the international speech communication association. ,(1997)
Renato De Mori, Spoken Dialogues with Computers ,(1998)
E. Levin, CHRONUS, The next generation spoken language technology workshop. pp. 269- 271 ,(1995)
Steve J. Young, Talking to machines (statistically speaking) conference of the international speech communication association. ,(2002)
L. Lamel, J.L. Gauvain, G. Adda, Investigating lightly supervised acoustic model training international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 477- 480 ,(2001) , 10.1109/ICASSP.2001.940871
R. Gretter, G. Riccardi, On-line learning of language models with word error probability distributions international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 557- 560 ,(2001) , 10.1109/ICASSP.2001.940892