Clustering aggregation

作者: Aristides Gionis , Heikki Mannila , Panayiotis Tsaparas

DOI: 10.1145/1217299.1217303

关键词:

摘要: We consider the following problem: given a set of clusterings, find single clustering that agrees as much possible with input clusterings. This problem, aggregation, appears naturally in various contexts. For example, categorical data is an instance aggregation problem; each attribute can be viewed rows where are grouped together if they take same value on attribute. Clustering also used metaclustering method to improve robustness by combining output multiple algorithms. Furthermore, problem formulation does not require priori information about number clusters; it determined optimization function.In this article, we give formal statement and propose Our algorithms make use connection between correlation clustering. Although problems NP-hard, for several our methods, provide theoretical guarantees quality solutions. work provides best deterministic approximation algorithm variation consider. show how sampling scale large datasets. extensive empirical evaluation demonstrating usefulness

参考文章(26)
Bruno Leclerc, Jean-Pierre Barthélemy, The Median Procedure for Partitions. Partitioning Data Sets. pp. 3- 34 ,(1993)
William F. Punch, Anil K. Jain, Alexander P. Topchy, A mixture model for clustering ensembles siam international conference on data mining. pp. 379- 390 ,(2004)
Constantinos Boulis, Mari Ostendorf, Combining multiple clustering systems european conference on principles of data mining and knowledge discovery. ,vol. 3202, pp. 63- 74 ,(2004) , 10.1007/978-3-540-30116-5_9
Periklis Andritsos, Panayiotis Tsaparas, Renée J. Miller, Kenneth C. Sevcik, LIMBO: Scalable Clustering of Categorical Data extending database technology. pp. 123- 146 ,(2004) , 10.1007/978-3-540-24741-8_9
Padhraic Smyth, Model selection for probabilistic clustering using cross-validatedlikelihood Statistics and Computing. ,vol. 10, pp. 63- 72 ,(2000) , 10.1023/A:1008940618127
Erik D. Demaine, Dotan Emanuel, Amos Fiat, Nicole Immorlica, Correlation clustering in general weighted graphs workshop on approximation and online algorithms. ,vol. 361, pp. 172- 187 ,(2006) , 10.1016/J.TCS.2006.05.008
Chaitanya Swamy, Correlation Clustering: maximizing agreements via semidefinite programming symposium on discrete algorithms. pp. 526- 527 ,(2004) , 10.5555/982792.982866
Ronald Fagin, D. Sivakumar, Ravi Kumar, Comparing top k lists symposium on discrete algorithms. ,vol. 17, pp. 28- 36 ,(2003) , 10.5555/644108.644113
Cynthia Dwork, Ravi Kumar, Moni Naor, D. Sivakumar, Rank aggregation methods for the Web Proceedings of the tenth international conference on World Wide Web - WWW '01. pp. 613- 622 ,(2001) , 10.1145/371920.372165
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, Rock: A robust clustering algorithm for categorical attributes Information Systems. ,vol. 25, pp. 345- 366 ,(2000) , 10.1016/S0306-4379(00)00022-3