作者: Detmar Meurers , Serhiy Bykh
DOI:
关键词: Natural language processing 、 Encoding (memory) 、 Classifier (linguistics) 、 Artificial intelligence 、 Task (project management) 、 Syntax 、 Native-language identification 、 Computer science 、 Machine learning 、 Perspective (graphical) 、 Feature (machine learning) 、 Classifier (UML)
摘要: In this paper, we systematically explore lexicalized and non-lexicalized local syntactic features for the task of Native Language Identification (NLI). We investigate different types feature representations in single- cross-corpus settings, including two inspired by a variationist perspective on choices made linguistic system. To combine models, use probabilities-based ensemble classifier propose technique to optimize tune it. Combining best performing with four n-grams outperforms approach NLI Shared Task 2013.