P3C: A Robust Projected Clustering Algorithm

作者: Gabriela Moise , Jorg Sander , Martin Ester

DOI: 10.1109/ICDM.2006.123

关键词:

摘要: Projected clustering has emerged as a possible solution to the challenges associated with in high dimensional data. A projected cluster is subset of points together attributes, such that project onto small range values each these and are uniformly distributed remaining attributes. Existing algorithms for rely on parameters whose appropriate difficult set by user, or unable identify clusters few relevant In this paper, we present robust algorithm can effectively discover data while minimizing number required input. contrast all previous approaches, our discover, under very general conditions, true clusters. We show through an extensive experimental evaluation algorithm: (1) significantly outperforms existing terms accuracy; (2) effective detecting low-dimensional embedded spaces; (3) varying orientation their subspaces; (4) scalable respect large sets dimensions.

参考文章(15)
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft, When Is ''Nearest Neighbor'' Meaningful? international conference on database theory. pp. 217- 235 ,(1999) , 10.1007/3-540-49257-7_15
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 94- 105 ,(1998) , 10.1145/276304.276314
Cecilia M. Procopiuc, Michael Jones, Pankaj K. Agarwal, T. M. Murali, A Monte Carlo algorithm for fast projective clustering Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02. pp. 418- 427 ,(2002) , 10.1145/564691.564739
Lance Parsons, Ehtesham Haque, Huan Liu, Subspace clustering for high dimensional data ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 90- 105 ,(2004) , 10.1145/1007730.1007731
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu, Cecilia Procopiuc, Jong Soo Park, Fast algorithms for projected clustering international conference on management of data. ,vol. 28, pp. 61- 72 ,(1999) , 10.1145/304181.304188
Peter J. Rousseeuw, Bert C. van Zomeren, Unmasking Multivariate Outliers and Leverage Points Journal of the American Statistical Association. ,vol. 85, pp. 633- 639 ,(1990) , 10.1080/01621459.1990.10474920
U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, A. J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays Proceedings of the National Academy of Sciences of the United States of America. ,vol. 96, pp. 6745- 6750 ,(1999) , 10.1073/PNAS.96.12.6745
Kevin Y Yip, David W Cheung, Michael K Ng, HARP: a practical projected clustering algorithm IEEE Transactions on Knowledge and Data Engineering. ,vol. 16, pp. 1387- 1397 ,(2004) , 10.1109/TKDE.2004.74