Constraint-based clustering in large databases

作者: Anthony K. H. Tung , Jiawei Han , Laks V.S. Lakshmanan , Raymond T. Ng

DOI: 10.1007/3-540-44503-X_26

关键词:

摘要: Constrained clustering--finding clusters that satisfy user-specified constraints--is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and show traditional algorithms (e.g., k-means) cannot handle it. A scalable constraint-clustering algorithm is developed study which starts by finding an initial solution satisfies constraints then refines performing confined object movements under constraints. Our consists of two phases: pivot movement deadlock resolution. For both phases, optimal NP-hard. We propose several heuristics how our can scale up for large data sets using heuristic micro-cluster sharing. By experiments, effectiveness efficiency heuristics.

参考文章(21)
David B Shmoys, Éva Tardos, Karen Aardal, Approximation algorithms for facility location problems Lecture Notes in Computer Science. pp. 27- 33 ,(2000) , 10.1007/3-540-44436-X_4
Peter J. Rousseeuw, Leonard Kaufman, Finding Groups in Data: An Introduction to Cluster Analysis ,(1990)
K.I. Aardal, É. Tardos, D.B. Shmoys, Approximation algorithms for facility location problems Universiteit Utrecht. UU-CS, Department of Computer Science. ,vol. 9739, ,(1997)
A. Demiriz, K.P. Bennett, P.S. Bradley, Constrained K-Means Clustering pp. 8- ,(2000)
Usama Fayyad, Cory Reina, P. S. Bradley, Scaling clustering algorithms to large databases knowledge discovery and data mining. pp. 9- 15 ,(1998)
Finding Groups in Data John Wiley & Sons, Inc.. ,(1990) , 10.1002/9780470316801
Jon Kleinberg, Christos Papadimitriou, Prabhakar Raghavan, A Microeconomic View of Data Mining Data Mining and Knowledge Discovery. ,vol. 2, pp. 311- 324 ,(1998) , 10.1023/A:1009726428407
Richard R. Muntz, Jiong Yang, Wei Wang, STING: A Statistical Information Grid Approach to Spatial Data Mining very large data bases. pp. 186- 195 ,(1997)
Raymond T. Ng, Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining very large data bases. pp. 144- 155 ,(1994)
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)