作者: Radu Tudor Ionescu , Marius Popescu , Aoife Cahill
DOI: 10.3115/V1/D14-1142
关键词:
摘要: A common approach in text mining tasks such as categorization, authorship identification or plagiarism detection is to rely on features like words, part-of-speech tags, stems, some other high-level linguistic features. In this work, an that uses character n-grams proposed for the task of native language identification. Instead doing standard feature selection, combines several string kernels using multiple kernel learning. Kernel Ridge Regression and Discriminant Analysis are independently used learning stage. The empirical results obtained all experiments conducted work indicate achieves state art performance identification, reaching accuracy 1.7% above top scoring system 2013 NLI Shared Task. Furthermore, has important advantage it independent theory neutral. cross-corpus experiment, shows can also be topic independent, improving by 32.3%.