Source-side Syntactic Reordering Patterns with Functional Words for Improved Phrase-based SMT

作者: Jinhua Du , Andy Way , Jie Jiang

DOI:

关键词: Computer scienceParsingSyntaxBLEUNatural language processingArtificial intelligenceTask (computing)Word (computer architecture)NISTPhraseSpeech recognition

摘要: Inspired by previous source-side syntactic reordering methods for SMT, this paper focuses on using automatically learned patterns with functional words which indicate structural reorderings between the source and target language. This approach takes advantage of phrase alignments parse trees pattern extraction, then filters out those without words. Word lattices transformed generated are fed into PBSMT systems to incorporate potential from inputs. Experiments carried a medium-sized corpus Chinese–English SMT task. The proposed method outperforms baseline system 1.38% relative randomly selected testset 10.45% NIST 2008 in terms BLEU score. Furthermore, just 61.88% filtered obtains comparable performance unfiltered one testset, achieves 1.74% improvements testset.

参考文章(20)
Felipe Sánchez-Martínez, Andy Way, Marker-Based Filtering of Bilingual Phrase Pairs for SMT Proceedings of the 13th Annual conference of the European Association for Machine Translation. ,(2009)
Jinhua Du, Andy Way, The Impact of Source-Side Syntactic Reordering on Hierarchical Phrase-based SMT Proceedings of the 14th Annual conference of the European Association for Machine Translation. ,(2010)
Pi-Chuan Chang, Dan Jurafsky, Christopher D. Manning, Disambiguating "DE" for Chinese-English machine translation Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT '09. pp. 215- 223 ,(2009) , 10.3115/1626431.1626474
Michael McCord, Fei Xia, Improving a statistical MT system with automatically learned rewrite patterns Proceedings of the 20th international conference on Computational Linguistics - COLING '04. pp. 508- 514 ,(2004) , 10.3115/1220355.1220428
Jakob Elming, Syntactic reordering integrated with phrase-based SMT Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation - SSST '08. pp. 46- 54 ,(2008) , 10.3115/1626269.1626275
Naiwen Xue, Fei Xia, Fu-Dong Chiou, Marta Palmer, None, The Penn Chinese TreeBank: Phrase structure annotation of a large corpus Natural Language Engineering. ,vol. 11, pp. 207- 238 ,(2005) , 10.1017/S135132490400364X
Yaser Al-Onaizan, Kishore Papineni, Distortion Models for Statistical Machine Translation meeting of the association for computational linguistics. pp. 529- 536 ,(2006) , 10.3115/1220175.1220242
Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, BLEU Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL '02. pp. 311- 318 ,(2001) , 10.3115/1073083.1073135
Richard Zens, Franz Josef Och, Hermann Ney, Phrase-Based Statistical Machine Translation Lecture Notes in Computer Science. pp. 18- 32 ,(2002) , 10.1007/3-540-45751-8_2
Philipp Koehn, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, Evan Herbst, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Moses: Open Source Toolkit for Statistical Machine Translation meeting of the association for computational linguistics. pp. 177- 180 ,(2007) , 10.3115/1557769.1557821