作者: Martine Adda-Decker , Gilles Adda , Lori Lamel , Jean-Luc Gauvain
DOI:
关键词:
摘要: In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for speech recognition in French. The process defines what is considered to be word by system. Depending definition can measure different lexical coverages model perplexities, both which are closely related accuracies obtained read newspaper texts. Different normalizations up 185M words texts presented along with corresponding coverage perplexity measures. Some were found necessary achieve good coverage, while others more or less equivalent regard. choice create use experiments was based these findings. Our best system configuration 11.2% error rate AUPELF ‘French-speaking’ recognizer evaluation test held February 1997.