PCS: An Efficient Clustering Method for High-Dimensional Data.

作者: Cindy X. Chen , Jie Wang , Wei Li

DOI:

关键词: Data miningCURE data clustering algorithmClustering high-dimensional dataData stream clusteringCorrelation clusteringComputer scienceConsensus clusteringCluster analysisDetermining the number of clusters in a data setCanopy clustering algorithm

摘要: Clustering algorithms play an important role in data analysis and information retrieval. How to obtain a clustering for large set of highdimensional suitable database applications remains challenge. We devise this paper set-theoretic method called PCS (Pairwise Consensus Scheme) high-dimensional data. Given d-dimensional data, first constructs ( d p ) clusterings, where ≤ is small number (e.g., = 2 or 3) each constructed on projected combination selected dimensions using existing p-dimensional algorithm. then constructs, greedy pairwise comparison technique based recent algorithm [1], near-optimal consensus from these clusterings be the final original set. show that incurs only moderate I/O cost, memory requirement independent size. Finally, we carry out numerical experiments demonstrate efficiency PCS.

参考文章(12)
Usama Fayyad, Cory Reina, P. S. Bradley, Scaling clustering algorithms to large databases knowledge discovery and data mining. pp. 9- 15 ,(1998)
Raymond T. Ng, Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining very large data bases. pp. 144- 155 ,(1994)
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 94- 105 ,(1998) , 10.1145/276304.276314
Dan Gusfield, Partition-distance: A problem and class of perfect graphs arising in clustering Information Processing Letters. ,vol. 82, pp. 159- 164 ,(2002) , 10.1016/S0020-0190(01)00263-0
Chun-Hung Cheng, Ada Waichee Fu, Yi Zhang, None, Entropy-based subspace clustering for mining numerical data knowledge discovery and data mining. pp. 84- 93 ,(1999) , 10.1145/312129.312199
Piotr Berman, Bhaskar DasGupta, Ming-Yang Kao, Jie Wang, On constructing an optimal consensus clustering from multiple clusterings Information Processing Letters. ,vol. 104, pp. 137- 145 ,(2007) , 10.1016/J.IPL.2007.06.008
Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu, Cecilia Procopiuc, Jong Soo Park, Fast algorithms for projected clustering international conference on management of data. ,vol. 28, pp. 61- 72 ,(1999) , 10.1145/304181.304188
Tian Zhang, Raghu Ramakrishnan, Miron Livny, BIRCH: an efficient data clustering method for very large databases international conference on management of data. ,vol. 25, pp. 103- 114 ,(1996) , 10.1145/233269.233324
Charu C. Aggarwal, Philip S. Yu, Finding generalized projected clusters in high dimensional spaces international conference on management of data. ,vol. 29, pp. 70- 81 ,(2000) , 10.1145/335191.335383