Authorship identification for heterogeneous documents

作者： Yuta Tsuboi , None

DOI:

关键词: Information retrieval 、 Sequential Pattern Mining 、 PrefixSpan 、 Support vector machine 、 Computer science 、 Classifier (UML) 、 Data mining 、 Identification (information) 、 Mailing list 、 Word (computer architecture)

摘要: The study of authorship identification in Japanese has for the most part been restricted to literary texts using basic statistical methods. In present study, authors mailing list messages are identified a machine learning technique (Support Vector Machines). addition, classifier trained on data is applied identify author Web documents order investigate performance more heterogeneous documents. Experimental results show better when we use features not only conventional word N-gram information but also frequent sequential patterns extracted by mining (PrefixSpan).

naist.jp 本地加速

暂无可下载资源，当前可以选择系统获取到有开放资源时通知我或者直接发起求助文献求助

参考文章(10)

Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)

Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing ,(1999)

Joachim Diederich, Jörg Kindermann, Edda Leopold, Gerhard Paass, Authorship Attribution with Support Vector Machines Applied Intelligence. ,vol. 19, pp. 109- 123 ,(2003) , 10.1023/A:1023824908771

O. de Vel, A. Anderson, M. Corney, G. Mohay, Mining e-mail content for author identification forensics international conference on management of data. ,vol. 30, pp. 55- 64 ,(2001) , 10.1145/604264.604272

Yiming Yang, An Evaluation of Statistical Approaches to Text Categorization Information Retrieval. ,vol. 1, pp. 69- 90 ,(1999) , 10.1023/A:1009982220290

Jian Pei, Jiawei Han, B. Mortazavi-Asl, H. Pinto, Qiming Chen, U. Dayal, Mei-Chun Hsu, PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth international conference on data engineering. pp. 215- 224 ,(2001) , 10.1109/ICDE.2001.914830

Thorsten Joachims, Text Categorization with Suport Vector Machines: Learning with Many Relevant Features european conference on machine learning. ,vol. 1398, pp. 137- 142 ,(1998) , 10.1007/BFB0026683

Vladimir N. Vapnik, The Nature of Statistical Learning Theory ,(1995)

R. Agrawal, R. Srikant, Mining sequential patterns international conference on data engineering. pp. 3- 14 ,(1995) , 10.1109/ICDE.1995.380415

10.

Richard Ernest Bellman, Adaptive Control Processes: A Guided Tour ,(1961)

Authorship identification for heterogeneous documents

来源期刊

我的账户

Authorship identification for heterogeneous documents

来源期刊

相似文章 10

我的账户