Maximizing Classification Accuracy in Native Language Identification

作者: Yves Bestgen , Scott Jarvis , Steve Pepper

DOI:

关键词:

摘要: This paper reports our contribution to the 2013 NLI Shared Task. The purpose of task was train a machine-learning system identify native-language affiliations 1,100 texts written in English by nonnative speakers as part high-stakes test general academic proficiency. We trained on new TOEFL11 corpus, which includes 11,000 essays from 11 backgrounds. Our final used an SVM classifier with over 400,000 unique features consisting lexical and POS n-grams occurring at least two training set. identified correct nativelanguage 83.6% highest classification accuracy achieved

参考文章(16)
Moshe Koppel, Jonathan Schler, Kfir Zigdon, Automatically Determining an Anonymous Author’s Native Language Intelligence and Security Informatics. pp. 209- 217 ,(2005) , 10.1007/11427995_17
Fanny Meunier, Sylviane Granger, Estelle Dagneaux, The International Corpus of Learner English. Handbook and CD-ROM ,(2002)
Yves Bestgen, Jennifer Thewissen, Sylviane Granger, Error patterns and automatic L1 identification pp. 154- 177 ,(2012)
Laura Mayfield Tomokiyo, Rosie Jones, You're not from 'round here, are you? Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001 - NAACL '01. pp. 1- 8 ,(2001) , 10.3115/1073336.1073367
Daniel Blanchard, Joel Tetreault, Derrick Higgins, Aoife Cahill, Martin Chodorow, TOEFL11: A CORPUS OF NON‐NATIVE ENGLISH ETS Research Report Series. ,vol. 2013, pp. 15- ,(2013) , 10.1002/J.2333-8504.2013.TB02331.X
Susan T. Dumais, Improving the retrieval of information from external sources Behavior Research Methods, Instruments, & Computers. ,vol. 23, pp. 229- 236 ,(1991) , 10.3758/BF03203370