作者: Rui Wang , Hai Zhao , Bao-Liang Lu , Masao Utiyama , Eiichiro Sumita
DOI: 10.1109/TASLP.2015.2425220
关键词:
摘要: Larger n-gram language models (LMs) perform better in statistical machine translation (SMT). However, the existing approaches have two main drawbacks for constructing larger LMs: 1) it is not convenient to obtain corpora same domain as bilingual parallel SMT; 2) most of previous studies focus on monolingual information from target only, and redundant n-grams been fully utilized SMT. Nowadays, continuous-space model (CSLM), especially neural network (NNLM), has shown great improvement estimation accuracies probabilities predicting words. these CSLM NNLM still consider only or require additional corpus. In this paper, we propose a novel based LM growing method. Compared approaches, proposed method enables us use corpus The results show that our new outperforms both SMT performance computational efficiency significantly.