作者: Chris Giannella
DOI: 10.1002/ASI.23375
关键词: Set (abstract data type) 、 Baseline (configuration management) 、 Improved algorithm 、 Data mining 、 Decomposition (computer science) 、 Cluster analysis 、 Natural language 、 Artificial neural network 、 Computer science
摘要: This article addresses the problem of unsupervised decomposition a multi-author text document: identifying sentences written by each author assuming number authors is unknown. An approach, BayesAD, developed for solving this problem: apply Bayesian segmentation algorithm, followed segment clustering algorithm. Results are presented from an empirical comparison between BayesAD and AK, modified version approach published Akiva Koppel in 2013. exhibited greater accuracy than AK all experiments. However, has parameter that needs to be set which had nontrivial impact on accuracy. Developing effective method eliminating need would fruitful direction future work. When controlling topic, levels were, but one case, worse baseline wherein was assumed write input document. Hence, room improved solutions exists.