DCU-Symantec Submission for the WMT 2012 Quality Estimation Task

作者： Fred Hollowood , Raphael Rubino , Rasul Samad Zadeh Kaljahi , Jennifer Foster , Joachim Wagner

DOI:

关键词:

摘要: This paper describes the features and machine learning methods used by Dublin City University (DCU) SYMANTEC for WMT 2012 quality estimation task. Two sets of are proposed: one constrained, i.e. respecting data limitation suggested workshop organisers, unconstrained, using or tools trained on that was not provided organisers. In total, more than 300 were extracted to train classifiers in order predict translation unseen data. this paper, we focus a subset our feature set consider be relatively novel: based topic model built Latent Dirichlet Allocation approach, source target language syntax part-of-speech (POS) taggers parsers. We evaluate nine combinations four classification-based regression-based techniques.

参考文章(30)

Christopher B. Quirk, Training a Sentence-Level Machine Translation Confidence Measure. language resources and evaluation. ,(2004)

Joachim Wagner, Detecting grammatical errors with treebank-induced, probabilistic parsers ,(2012)

Raphaël Rubino, Georges Linarès, A Multi-view Approach for Term Translation Spotting Computational Linguistics and Intelligent Text Processing. pp. 29- 40 ,(2011) , 10.1007/978-3-642-19437-5_3

John C. Platt, Fast training of support vector machines using sequential minimal optimization Advances in kernel methods. pp. 185- 208 ,(1999)

Ian H. Witten, Yong Wang, Induction of model trees for predicting continuous classes ,(1996)

Josef van Genabith, Jennifer Foster, Joachim Wagner, A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors empirical methods in natural language processing. pp. 112- 121 ,(2007)

Andreas Stolcke, SRILM – An Extensible Language Modeling Toolkit conference of the international speech communication association. ,(2002)

Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556

V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals Soviet physics. Doklady. ,vol. 10, pp. 707- 710 ,(1966)

10.

David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937

DCU-Symantec Submission for the WMT 2012 Quality Estimation Task

来源期刊

我的账户

DCU-Symantec Submission for the WMT 2012 Quality Estimation Task

来源期刊

相似文章 10

我的账户