作者: Fred Hollowood , Raphael Rubino , Rasul Samad Zadeh Kaljahi , Jennifer Foster , Joachim Wagner
DOI:
关键词:
摘要: This paper describes the features and machine learning methods used by Dublin City University (DCU) SYMANTEC for WMT 2012 quality estimation task. Two sets of are proposed: one constrained, i.e. respecting data limitation suggested workshop organisers, unconstrained, using or tools trained on that was not provided organisers. In total, more than 300 were extracted to train classifiers in order predict translation unseen data. this paper, we focus a subset our feature set consider be relatively novel: based topic model built Latent Dirichlet Allocation approach, source target language syntax part-of-speech (POS) taggers parsers. We evaluate nine combinations four classification-based regression-based techniques.