A generic unsupervised method for decomposing multi‐author documents

作者: Navot Akiva , Moshe Koppel

DOI: 10.1002/ASI.22924

关键词:

摘要: Given an unsegmented multi-author text, we wish to automatically separate out distinct authorial threads. We present a novel, entirely unsupervised, method that achieves strong results on multiple testbeds, including those for which threads are topically identical. Unlike previous work, our requires no specialized linguistic tools and can be easily applied any text.

参考文章(18)
P. Brezillon, P. Bouquet, Lecture Notes in Artificial Intelligence ,(1999)
J. Estlin Carpenter, G. Harford-Battersby, The Hexateuch: According to the Revised Version ,(2009)
Sven Meyer zu Eissen, Benno Stein, Intrinsic Plagiarism Detection Lecture Notes in Computer Science. pp. 565- 569 ,(2006) , 10.1007/11735106_66
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Jonathan H. Clark, Charles J. Hannon, A classifier system for author recognition using synonym-based features mexican international conference on artificial intelligence. pp. 839- 849 ,(2007) , 10.1007/978-3-540-76631-5_80
Moshe Koppel, Navot Akiva, Ido Dagan, Feature instability as a criterion for selecting potential style markers Journal of the Association for Information Science and Technology. ,vol. 57, pp. 1519- 1525 ,(2006) , 10.1002/ASI.20428
Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas Griffiths, Padhraic Smyth, Mark Steyvers, Learning author-topic models from text corpora ACM Transactions on Information Systems. ,vol. 28, pp. 1- 38 ,(2010) , 10.1145/1658377.1658381
Yehuda T. Radday, Isaiah and the computer: A preliminary report Computers and The Humanities. ,vol. 5, pp. 65- 73 ,(1970) , 10.1007/BF02402282