Generation-heavy hybrid machine translation

作者: Nizar Yahya Habash , Bonnie J. Dorr

DOI:

关键词: Example-based machine translationNatural language processingComputer scienceRule-based machine translationMachine translationArtificial intelligenceHybrid machine translationNatural language generationTransfer-based machine translationGrammaticalityParsing

摘要: The state of the art techniques in Machine Translation (MT) require large amounts symmetric resources from source and target languages. This is true regardless whether approach Transfer or Interlingua, Symbolic Statistical. Symmetry within these approaches necessary to ensure quality, robustness retargetability. In reality, such symmetry, it terms structural transfer lexicons, interlingual dictionaries parallel corpora, a major bottleneck developing any MT system. This dissertation presents an that addresses lack symmetry by exploiting symbolic statistical language source-poor/target-rich pairs. called Generation-Heavy Hybrid (GHMT). Expected include syntactic parser simple one-to-many translation dictionary. No rules complex representations are used. Rich used overgenerate multiple variations target-glossed dependency representation sentences. Statistical target-language resource then select amongst overgenerated translations. source-target asymmetry systems developed this makes them more easily retargetable new The contributions research include: (1) a model for machine transcends need knowledge while maintaining high degree robustness, retargetability; (2) a systematic framework handling divergences uniformly accommodates wide range seemingly different divergence types their interactions; (3) a hybrid (symbolic-statistical) generation expands concept overgeneration conflation head-swapping variations; (4) the introduction use n-grams on scale natural generation; (5) the creation several have been other researchers including extensible system translating into English large-scale categorical variation database English. An extensive evaluation suggests GHMT robust has superior output grammaticality accuracy, relative primarily approach.

参考文章(0)