作者: Stephan Vogel , Ashish Venugopal
DOI:
关键词: Context-free grammar 、 Decoding methods 、 Terminal and nonterminal symbols 、 Rule-based machine translation 、 Synchronous context-free grammar 、 Algorithm 、 Natural language processing 、 Computer science 、 Artificial intelligence 、 Probabilistic logic 、 Phrase 、 Language model
摘要: Probabilistic Synchronous Context-free Grammars (PSCFGs) [Aho and Ullmann, 1969, Wu, 1996] define weighted transduction rules to represent translation reordering operations. When models use features that are defined locally, on each rule, there efficient dynamic programming algorithms perform with these grammars [Kasami, 1965]. In general, the integration of non-local into model can make NP-hard, requiring decoding approximations limit impact features. In this thesis, we consider interaction between two features, n-gram language (LM) labels rule nonterminal symbols in Syntax-Augmented MT (SAMT) grammar [Zollmann Venugopal, 2006]. While do not result NP-hard search, they would lead serious increases wall-clock runtime if naive methods applied. We develop novel two-pass strong during a first pass generating hypergraph sentence spanning derivations. second pass, knowledge about explore for alternative, potentially better translations. We approach integrate LM feature as well syntactic described below. then systematic comparison approaches evaluate relative PSCFG over phrase-based baseline focus labels. This addresses important questions effectiveness variety resource conditions. learn pairs exhibit long distance reordering, deliver improvements comparable systems SAMT additional small, but consistent even conjunction LMs. Finally, propose by extending formalism hard label constraints soft preferences. These preferences used compute new reflects probability derivation is syntactically formed. mitigates effect commonly applied maximum posteriori (MAP) approximation be discriminatively trained concert other report modest quality Chinese-to-English task.