作者: Sehyeong Cho
DOI: 10.1007/978-3-540-30497-5_63
关键词:
摘要: This paper proposes a method of combining two n-gram language models to construct single model. One the corpora is constructed from very small corpus right domain interest, and other large but less adequate corpus. based on observation that has high quality n-grams suffers sparseness problem, while another inadequately biased, easy obtain bigger size. The basic idea behind dual-source backoff basically same with Katz's backoff. We ran experiments 3-gram newspaper several millions tens words together smaller broadcast news corpora. target was news. obtained significant improvement by incorporating around one thirtieth size