An improved algorithm for unsupervised decomposition of a multi-author document

作者： Chris Giannella

关键词: Set (abstract data type) 、 Baseline (configuration management) 、 Improved algorithm 、 Data mining 、 Decomposition (computer science) 、 Cluster analysis 、 Natural language 、 Artificial neural network 、 Computer science

摘要: This article addresses the problem of unsupervised decomposition a multi-author text document: identifying sentences written by each author assuming number authors is unknown. An approach, BayesAD, developed for solving this problem: apply Bayesian segmentation algorithm, followed segment clustering algorithm. Results are presented from an empirical comparison between BayesAD and AK, modified version approach published Akiva Koppel in 2013. exhibited greater accuracy than AK all experiments. However, has parameter that needs to be set which had nontrivial impact on accuracy. Developing effective method eliminating need would fruitful direction future work. When controlling topic, levels were, but one case, worse baseline wherein was assumed write input document. Hence, room improved solutions exists.

参考文章(25)

Efstathios Stamatatos, Intrinsic Plagiarism Detection Using Character n-gram Profiles ,(2009)

Graeme Hirst, Julian Brooke, Adam Hammond, Unsupervised Stylistic Segmentation of Poetry with Change Curves and Extrinsic Features north american chapter of the association for computational linguistics. pp. 26- 35 ,(2012)

Navot Akiva, Moshe Koppel, A generic unsupervised method for decomposing multi‐author documents Journal of the Association for Information Science and Technology. ,vol. 64, pp. 2256- 2264 ,(2013) , 10.1002/ASI.22924

Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)

Marti A. Hearst, TextTiling: segmenting text into multi-paragraph subtopic passages Computational Linguistics. ,vol. 23, pp. 33- 64 ,(1997)

Hemant Misra, François Yvon, Olivier Cappé, Joemon Jose, Text segmentation: A topic modeling perspective Information Processing and Management. ,vol. 47, pp. 528- 544 ,(2011) , 10.1016/J.IPM.2010.11.008

Shlomo Argamon, Jonathan Schler, Moshe Koppel, Computational methods in authorship attribution Journal of the Association for Information Science and Technology. ,vol. 60, pp. 9- 26 ,(2009) , 10.1002/ASI.V60:1

Athanasios Kehagias, Fragkou Pavlina, Vassilios Petridis, Linear text segmentation using a dynamic programming algorithm conference of the european chapter of the association for computational linguistics. pp. 171- 178 ,(2003) , 10.3115/1067807.1067831

Navot Akiva, Moshe Koppel, Identifying Distinct Components of a Multi-author Document european intelligence and security informatics conference. pp. 205- 209 ,(2012) , 10.1109/EISIC.2012.16

10.

Jacob Eisenstein, Regina Barzilay, Bayesian unsupervised topic segmentation Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 334- 343 ,(2008) , 10.3115/1613715.1613760

An improved algorithm for unsupervised decomposition of a multi-author document

来源期刊

我的账户

An improved algorithm for unsupervised decomposition of a multi-author document

来源期刊

相似文章 9

我的账户