Authorship identification for heterogeneous documents

作者: Yuta Tsuboi , None

DOI:

关键词: Information retrievalSequential Pattern MiningPrefixSpanSupport vector machineComputer scienceClassifier (UML)Data miningIdentification (information)Mailing listWord (computer architecture)

摘要: The study of authorship identification in Japanese has for the most part been restricted to literary texts using basic statistical methods. In present study, authors mailing list messages are identified a machine learning technique (Support Vector Machines). addition, classifier trained on data is applied identify author Web documents order investigate performance more heterogeneous documents. Experimental results show better when we use features not only conventional word N-gram information but also frequent sequential patterns extracted by mining (PrefixSpan).

参考文章(10)
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing ,(1999)
Joachim Diederich, Jörg Kindermann, Edda Leopold, Gerhard Paass, Authorship Attribution with Support Vector Machines Applied Intelligence. ,vol. 19, pp. 109- 123 ,(2003) , 10.1023/A:1023824908771
O. de Vel, A. Anderson, M. Corney, G. Mohay, Mining e-mail content for author identification forensics international conference on management of data. ,vol. 30, pp. 55- 64 ,(2001) , 10.1145/604264.604272
Yiming Yang, An Evaluation of Statistical Approaches to Text Categorization Information Retrieval. ,vol. 1, pp. 69- 90 ,(1999) , 10.1023/A:1009982220290
Jian Pei, Jiawei Han, B. Mortazavi-Asl, H. Pinto, Qiming Chen, U. Dayal, Mei-Chun Hsu, PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth international conference on data engineering. pp. 215- 224 ,(2001) , 10.1109/ICDE.2001.914830
Thorsten Joachims, Text Categorization with Suport Vector Machines: Learning with Many Relevant Features european conference on machine learning. ,vol. 1398, pp. 137- 142 ,(1998) , 10.1007/BFB0026683
R. Agrawal, R. Srikant, Mining sequential patterns international conference on data engineering. pp. 3- 14 ,(1995) , 10.1109/ICDE.1995.380415
Richard Ernest Bellman, Adaptive Control Processes: A Guided Tour ,(1961)