作者: Yves Bestgen , Scott Jarvis , Steve Pepper
DOI:
关键词:
摘要: This paper reports our contribution to the 2013 NLI Shared Task. The purpose of task was train a machine-learning system identify native-language affiliations 1,100 texts written in English by nonnative speakers as part high-stakes test general academic proficiency. We trained on new TOEFL11 corpus, which includes 11,000 essays from 11 backgrounds. Our final used an SVM classifier with over 400,000 unique features consisting lexical and POS n-grams occurring at least two training set. identified correct nativelanguage 83.6% highest classification accuracy achieved