Dual-source backoff for enhancing language models

作者: Sehyeong Cho

DOI: 10.1007/978-3-540-30497-5_63

关键词:

摘要: This paper proposes a method of combining two n-gram language models to construct single model. One the corpora is constructed from very small corpus right domain interest, and other large but less adequate corpus. based on observation that has high quality n-grams suffers sparseness problem, while another inadequately biased, easy obtain bigger size. The basic idea behind dual-source backoff basically same with Katz's backoff. We ran experiments 3-gram newspaper several millions tens words together smaller broadcast news corpora. target was news. obtained significant improvement by incorporating around one thirtieth size

参考文章(6)
Katunobu Itou, Atsushi Fujii, Tetsuya Ishikawa, Tomoyosi Akiba, Selective back-off smoothing for incorporating grammatical constraints into the n-gram language model. conference of the international speech communication association. ,(2002)
F. Jelinek, R. L. Mercer, L. R. Bahl, J. K. Baker, Perplexity—a measure of the difficulty of speech recognition tasks Journal of the Acoustical Society of America. ,vol. 62, ,(1977) , 10.1121/1.2016299
S.F. Chen, K. Seymore, R. Rosenfeld, Topic adaptation for language modeling using unnormalized exponential models international conference on acoustics speech and signal processing. ,vol. 2, pp. 681- 684 ,(1998) , 10.1109/ICASSP.1998.675356
S. Katz, Estimation of probabilities from sparse data for the language model component of a speech recognizer IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 35, pp. 400- 401 ,(1987) , 10.1109/TASSP.1987.1165125