作者: Nizar Yahya Habash , Bonnie J. Dorr
DOI:
关键词: Example-based machine translation 、 Natural language processing 、 Computer science 、 Rule-based machine translation 、 Machine translation 、 Artificial intelligence 、 Hybrid machine translation 、 Natural language generation 、 Transfer-based machine translation 、 Grammaticality 、 Parsing
摘要: The state of the art techniques in Machine Translation (MT) require large amounts symmetric resources from source and target languages. This is true regardless whether approach Transfer or Interlingua, Symbolic Statistical. Symmetry within these approaches necessary to ensure quality, robustness retargetability. In reality, such symmetry, it terms structural transfer lexicons, interlingual dictionaries parallel corpora, a major bottleneck developing any MT system. This dissertation presents an that addresses lack symmetry by exploiting symbolic statistical language source-poor/target-rich pairs. called Generation-Heavy Hybrid (GHMT). Expected include syntactic parser simple one-to-many translation dictionary. No rules complex representations are used. Rich used overgenerate multiple variations target-glossed dependency representation sentences. Statistical target-language resource then select amongst overgenerated translations. source-target asymmetry systems developed this makes them more easily retargetable new The contributions research include: (1) a model for machine transcends need knowledge while maintaining high degree robustness, retargetability; (2) a systematic framework handling divergences uniformly accommodates wide range seemingly different divergence types their interactions; (3) a hybrid (symbolic-statistical) generation expands concept overgeneration conflation head-swapping variations; (4) the introduction use n-grams on scale natural generation; (5) the creation several have been other researchers including extensible system translating into English large-scale categorical variation database English. An extensive evaluation suggests GHMT robust has superior output grammaticality accuracy, relative primarily approach.