Can characters reveal your native language? A language-independent approach to native language identification

作者： Radu Tudor Ionescu , Marius Popescu , Aoife Cahill

关键词:

摘要: A common approach in text mining tasks such as categorization, authorship identification or plagiarism detection is to rely on features like words, part-of-speech tags, stems, some other high-level linguistic features. In this work, an that uses character n-grams proposed for the task of native language identification. Instead doing standard feature selection, combines several string kernels using multiple kernel learning. Kernel Ridge Regression and Discriminant Analysis are independently used learning stage. The empirical results obtained all experiments conducted work indicate achieves state art performance identification, reaching accuracy 1.7% above top scoring system 2013 NLI Shared Task. Furthermore, has important advantage it independent theory neutral. cross-corpus experiment, shows can also be topic independent, improving by 32.3%.

uni-trier.de 本地加速

aclweb.org 本地加速

doi.org 本地加速

emnlp2014.org PDF 下载加速

aclweb.org PDF 下载加速

aclanthology.org PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(24)

Dominique Estival, Tanja Gaustad, Will Radford, Ben Hutchinson, Son Bao Pham, Author Profiling for English and Arabic Emails ,(2008)

Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)

Text classification using string kernels Journal of Machine Learning Research. ,vol. 2, pp. 419- 444 ,(2002) , 10.1162/153244302760200687

Moshe Koppel, Jonathan Schler, Kfir Zigdon, Automatically Determining an Anonymous Author’s Native Language Intelligence and Security Informatics. pp. 209- 217 ,(2005) , 10.1007/11427995_17

Nello Cristianini, John Shawe-Taylor, Kernel Methods for Pattern Analysis ,(2004)

Fanny Meunier, Sylviane Granger, Estelle Dagneaux, The International Corpus of Learner English. Handbook and CD-ROM ,(2002)

Yves Bestgen, Scott Jarvis, Steve Pepper, Maximizing Classification Accuracy in Native Language Identification workshop on innovative use of nlp for building educational applications. pp. 111- 118 ,(2013)

Liviu P. Dinu, On the classification and aggregation of hierarchies with different constitutive elements Fundamenta Informaticae. ,vol. 55, pp. 39- 50 ,(2002)

Andrea Vedaldi, Andrew Zisserman, Efficient additive kernels via explicit feature maps computer vision and pattern recognition. pp. 3539- 3546 ,(2010) , 10.1109/CVPR.2010.5539949

10.

Daniel Blanchard, Joel Tetreault, Derrick Higgins, Aoife Cahill, Martin Chodorow, TOEFL11: A CORPUS OF NON‐NATIVE ENGLISH ETS Research Report Series. ,vol. 2013, pp. 15- ,(2013) , 10.1002/J.2333-8504.2013.TB02331.X

Can characters reveal your native language? A language-independent approach to native language identification

来源期刊

我的账户

Can characters reveal your native language? A language-independent approach to native language identification

来源期刊

相似文章 10

我的账户