Exploring Syntactic Features for Native Language Identification: A Variationist Perspective on Feature Encoding and Ensemble Optimization

作者: Detmar Meurers , Serhiy Bykh

DOI:

关键词: Natural language processingEncoding (memory)Classifier (linguistics)Artificial intelligenceTask (project management)SyntaxNative-language identificationComputer scienceMachine learningPerspective (graphical)Feature (machine learning)Classifier (UML)

摘要: In this paper, we systematically explore lexicalized and non-lexicalized local syntactic features for the task of Native Language Identification (NLI). We investigate different types feature representations in single- cross-corpus settings, including two inspired by a variationist perspective on choices made linguistic system. To combine models, use probabilities-based ensemble classifier propose technique to optimize tune it. Combining best performing with four n-grams outperforms approach NLI Shared Task 2013.

参考文章(30)
Sylviane Granger, Estelle Dagneaux, Fanny Meunier, Magali Paquot, International Corpus of Learner English ICAME Journal. ,vol. 28, pp. 109- 113 ,(2004)
André Lynum, Native Language Identification using large scale lexical features workshop on innovative use of nlp for building educational applications. pp. 266- 269 ,(2013)
Shervin Malmasi, Sze-Meng Jojo Wong, Mark Dras, NLI Shared Task 2013: MQ Submission workshop on innovative use of nlp for building educational applications. pp. 124- 133 ,(2013)
Sylviane Granger, The International Corpus of Learner English Centre for English Corpus Linguistics, Université catholique de Louvain. ,(1993)
Yves Bestgen, Scott Jarvis, Steve Pepper, Maximizing Classification Accuracy in Native Language Identification workshop on innovative use of nlp for building educational applications. pp. 111- 118 ,(2013)
Daniel Blanchard, Joel Tetreault, Derrick Higgins, Aoife Cahill, Martin Chodorow, TOEFL11: A CORPUS OF NON‐NATIVE ENGLISH ETS Research Report Series. ,vol. 2013, pp. 15- ,(2013) , 10.1002/J.2333-8504.2013.TB02331.X
Love Hafdell, Pierre Nugues, Bohnet Bernd, Björkelund Anders, A High-Performance Syntactic and Semantic Dependency Parser international conference on computational linguistics. pp. 33- 36 ,(2010)
Peter Wittenburg, Binyam Gebrekidan Gebre, Tom Heskes, Marcos Zampieri, Improving Native Language Identification with TF-IDF Weighting workshop on innovative use of nlp for building educational applications. pp. 216- 223 ,(2013)
Detmar Meurers, Serhiy Bykh, Native Language Identification using Recurring $n$-grams -- Investigating Abstraction and Domain Dependence international conference on computational linguistics. pp. 425- 440 ,(2012)