Constraint-driven clustering

作者: Rong Ge , Martin Ester , Wen Jin , Ian Davidson

DOI: 10.1145/1281192.1281229

关键词:

摘要: Clustering methods can be either data-driven or need-driven. Data-driven intend to discover the true structure of underlying data while need-driven aims at organizing meet certain application requirements. Thus, (e.g. constrained) clustering is able find more useful and actionable clusters in applications such as energy aware sensor networks, privacy preservation, market segmentation. However, existing constrained require users provide number clusters, which often unknown advance, but has a crucial impact on result. In this paper, we argue that natural way generate let application-specific constraints decide clusters. For purpose, introduce novel cluster model, Constraint-Driven (CDC), finds an priori unspecified compact satisfy all user-provided constraints. Two general types are considered, i.e. minimum significance variance constraints, well combinations these two types. We prove NP-hardness CDC problem with different propose dynamic structure, CD-Tree, organizes points leaf nodes each node approximately satisfies minimizes objective function. Based CD-Trees, develop efficient algorithm solve new problem. Our experimental evaluation synthetic real datasets demonstrates quality generated scalability algorithm.

参考文章(35)
A. Demiriz, K.P. Bennett, P.S. Bradley, Constrained K-Means Clustering pp. 8- ,(2000)
S. S. Ravi, Ian Davidson, Identifying and generating easy sets of constraints for clustering national conference on artificial intelligence. pp. 336- 341 ,(2006)
Rina Panigrahy, Rajeev Motwani, Krishnaram Kenthapadi, Dilys Thomas, Tomas Feder, Gagan Aggarwal, An Zhu, Approximation Algorithms for k-Anonymity Journal of Privacy Technology. ,(2005)
Joydeep Ghosh, Alexander Strehl, Clustering and Visualization of Retail Market Baskets Advanced Information and Knowledge Processing. pp. 75- 102 ,(2005) , 10.1007/1-84628-183-0_3
Donald Roy Woods, Drawing planar graphs ,(1981)
Jon Kleinberg, Christos Papadimitriou, Prabhakar Raghavan, A Microeconomic View of Data Mining Data Mining and Knowledge Discovery. ,vol. 2, pp. 311- 324 ,(1998) , 10.1023/A:1009726428407
Alexander Strehl, Joydeep Ghosh, A Scalable Approach to Balanced, High-Dimensional Clustering of Market-Baskets ieee international conference on high performance computing data and analytics. pp. 525- 536 ,(2000) , 10.1007/3-540-44467-X_48
Wen Jin, Rong Ge, Weining Qian, On robust and effective k-anonymity in large databases knowledge discovery and data mining. pp. 621- 636 ,(2006) , 10.1007/11731139_72
Charu C. Aggarwal, Philip S. Yu, A Condensation Approach to Privacy Preserving Data Mining extending database technology. pp. 183- 199 ,(2004) , 10.1007/978-3-540-24741-8_12
Anthony K. H. Tung, Jiawei Han, Laks V.S. Lakshmanan, Raymond T. Ng, Constraint-based clustering in large databases international conference on database theory. pp. 405- 419 ,(2001) , 10.1007/3-540-44503-X_26