An improved algorithm for unsupervised decomposition of a multi-author document

作者: Chris Giannella

DOI: 10.1002/ASI.23375

关键词: Set (abstract data type)Baseline (configuration management)Improved algorithmData miningDecomposition (computer science)Cluster analysisNatural languageArtificial neural networkComputer science

摘要: This article addresses the problem of unsupervised decomposition a multi-author text document: identifying sentences written by each author assuming number authors is unknown. An approach, BayesAD, developed for solving this problem: apply Bayesian segmentation algorithm, followed segment clustering algorithm. Results are presented from an empirical comparison between BayesAD and AK, modified version approach published Akiva Koppel in 2013. exhibited greater accuracy than AK all experiments. However, has parameter that needs to be set which had nontrivial impact on accuracy. Developing effective method eliminating need would fruitful direction future work. When controlling topic, levels were, but one case, worse baseline wherein was assumed write input document. Hence, room improved solutions exists.

参考文章(25)
Graeme Hirst, Julian Brooke, Adam Hammond, Unsupervised Stylistic Segmentation of Poetry with Change Curves and Extrinsic Features north american chapter of the association for computational linguistics. pp. 26- 35 ,(2012)
Navot Akiva, Moshe Koppel, A generic unsupervised method for decomposing multi‐author documents Journal of the Association for Information Science and Technology. ,vol. 64, pp. 2256- 2264 ,(2013) , 10.1002/ASI.22924
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Marti A. Hearst, TextTiling: segmenting text into multi-paragraph subtopic passages Computational Linguistics. ,vol. 23, pp. 33- 64 ,(1997)
Hemant Misra, François Yvon, Olivier Cappé, Joemon Jose, Text segmentation: A topic modeling perspective Information Processing and Management. ,vol. 47, pp. 528- 544 ,(2011) , 10.1016/J.IPM.2010.11.008
Shlomo Argamon, Jonathan Schler, Moshe Koppel, Computational methods in authorship attribution Journal of the Association for Information Science and Technology. ,vol. 60, pp. 9- 26 ,(2009) , 10.1002/ASI.V60:1
Athanasios Kehagias, Fragkou Pavlina, Vassilios Petridis, Linear text segmentation using a dynamic programming algorithm conference of the european chapter of the association for computational linguistics. pp. 171- 178 ,(2003) , 10.3115/1067807.1067831
Navot Akiva, Moshe Koppel, Identifying Distinct Components of a Multi-author Document european intelligence and security informatics conference. pp. 205- 209 ,(2012) , 10.1109/EISIC.2012.16
Jacob Eisenstein, Regina Barzilay, Bayesian unsupervised topic segmentation Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 334- 343 ,(2008) , 10.3115/1613715.1613760