Gender Profiling from PhD Theses Using k-Nearest Neighbour and Sequential Minimal Optimisation

作者: Hoshiladevi Ramnial , Shireen Panchoo , Sameerchand Pudaruth

DOI: 10.1007/978-3-319-23258-4_32

关键词:

摘要: Author profiling is a subfield of text categorisation in which the aim to predict some characteristics writer. In this paper, our objective determine gender an author based on their writings. Our corpus consists 10 PhD theses was split into equal sized segments 1000, 5000 and 10000 words. From corpus, total 446 features were extracted. Some new like combined-words, words endings POS tags used study. The not separated categories. Two machine learning classifiers, namely k-nearest neighbour support vector machines classifier assess practicability utility We able achieve 100% accuracy using sequential minimal optimisation (SMO) algorithm with 40 document parts. Surprisingly, simple lazy (kNN) often discarded studies achieved 98% same group documents. Furthermore, 5-NN 7-NN even outperformed SMO when 400 parts 1000 each. These values are much higher than those obtained previous studies. However, we have dataset results therefore directly comparable. Thus, experiments provide further evidence that it possible infer computational linguistic approach.

参考文章(28)
Shane Bergsma, David Yarowsky, Matt Post, Stylometric Analysis of Scientific Articles north american chapter of the association for computational linguistics. pp. 327- 337 ,(2012)
Walter Daelemans, Explanation in computational stylometry international conference on computational linguistics. pp. 451- 462 ,(2013) , 10.1007/978-3-642-37256-8_37
Arjun Mukherjee, Bing Liu, Improving Gender Classification of Blog Authors empirical methods in natural language processing. pp. 207- 217 ,(2010)
Suraj Maharjan, Prasha Shrestha, Thamar Solorio, Ragib Hasan, A Straightforward Author Profiling Approach in MapReduce ibero-american conference on artificial intelligence. pp. 95- 107 ,(2014) , 10.1007/978-3-319-12027-0_8
Dominique Estival, Tanja Gaustad, Will Radford, Ben Hutchinson, Son Bao Pham, Author Profiling for English and Arabic Emails ,(2008)
Dominique Estival, Tanja Gaustad, Will Radford, Ben Hutchinson, Son Bao Pham, TAT: An Author Profiling Tool with Application to Arabic Emails Proceedings of the Australasian Language Technology Workshop 2007. pp. 21- 30 ,(2007)
Jane Lin, Automatic Author Profiling of Online Chat Logs Monterey, California. Naval Postgraduate School. ,(2007)
Tayfun Kucukyilmaz, B. Barla Cambazoglu, Cevdet Aykanat, Fazli Can, Chat Mining for Gender Prediction Advances in Information Systems. pp. 274- 283 ,(2006) , 10.1007/11890393_29
Santiago Segarra, Mark Eisen, Alejandro Ribeiro, Authorship Attribution Through Function Word Adjacency Networks IEEE Transactions on Signal Processing. ,vol. 63, pp. 5464- 5478 ,(2015) , 10.1109/TSP.2015.2451111
Shlomo Argamon, Moshe Koppel, James W. Pennebaker, Jonathan Schler, Automatically profiling the author of an anonymous text Communications of the ACM. ,vol. 52, pp. 119- 123 ,(2009) , 10.1145/1461928.1461959