作者: Hoshiladevi Ramnial , Shireen Panchoo , Sameerchand Pudaruth
DOI: 10.1007/978-3-319-23258-4_32
关键词:
摘要: Author profiling is a subfield of text categorisation in which the aim to predict some characteristics writer. In this paper, our objective determine gender an author based on their writings. Our corpus consists 10 PhD theses was split into equal sized segments 1000, 5000 and 10000 words. From corpus, total 446 features were extracted. Some new like combined-words, words endings POS tags used study. The not separated categories. Two machine learning classifiers, namely k-nearest neighbour support vector machines classifier assess practicability utility We able achieve 100% accuracy using sequential minimal optimisation (SMO) algorithm with 40 document parts. Surprisingly, simple lazy (kNN) often discarded studies achieved 98% same group documents. Furthermore, 5-NN 7-NN even outperformed SMO when 400 parts 1000 each. These values are much higher than those obtained previous studies. However, we have dataset results therefore directly comparable. Thus, experiments provide further evidence that it possible infer computational linguistic approach.