作者: Yuta Tsuboi , None
DOI:
关键词: Information retrieval 、 Sequential Pattern Mining 、 PrefixSpan 、 Support vector machine 、 Computer science 、 Classifier (UML) 、 Data mining 、 Identification (information) 、 Mailing list 、 Word (computer architecture)
摘要: The study of authorship identification in Japanese has for the most part been restricted to literary texts using basic statistical methods. In present study, authors mailing list messages are identified a machine learning technique (Support Vector Machines). addition, classifier trained on data is applied identify author Web documents order investigate performance more heterogeneous documents. Experimental results show better when we use features not only conventional word N-gram information but also frequent sequential patterns extracted by mining (PrefixSpan).