PCS: An Efficient Clustering Method for High-Dimensional Data.

DOI:

关键词: Data mining 、 CURE data clustering algorithm 、 Clustering high-dimensional data 、 Data stream clustering 、 Correlation clustering 、 Computer science 、 Consensus clustering 、 Cluster analysis 、 Determining the number of clusters in a data set 、 Canopy clustering algorithm

摘要: Clustering algorithms play an important role in data analysis and information retrieval. How to obtain a clustering for large set of highdimensional suitable database applications remains challenge. We devise this paper set-theoretic method called PCS (Pairwise Consensus Scheme) high-dimensional data. Given d-dimensional data, first constructs ( d p ) clusterings, where ≤ is small number (e.g., = 2 or 3) each constructed on projected combination selected dimensions using existing p-dimensional algorithm. then constructs, greedy pairwise comparison technique based recent algorithm [1], near-optimal consensus from these clusterings be the final original set. show that incurs only moderate I/O cost, memory requirement independent size. Finally, we carry out numerical experiments demonstrate efficiency PCS.

uni-trier.de 本地加速

暂无可下载资源，当前可以选择系统获取到有开放资源时通知我或者直接发起求助文献求助

参考文章(12)

Usama Fayyad, Cory Reina, P. S. Bradley, Scaling clustering algorithms to large databases knowledge discovery and data mining. pp. 9- 15 ,(1998)

Raymond T. Ng, Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining very large data bases. pp. 144- 155 ,(1994)

Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)

Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 94- 105 ,(1998) , 10.1145/276304.276314

Dan Gusfield, Partition-distance: A problem and class of perfect graphs arising in clustering Information Processing Letters. ,vol. 82, pp. 159- 164 ,(2002) , 10.1016/S0020-0190(01)00263-0

Chun-Hung Cheng, Ada Waichee Fu, Yi Zhang, None, Entropy-based subspace clustering for mining numerical data knowledge discovery and data mining. pp. 84- 93 ,(1999) , 10.1145/312129.312199

Piotr Berman, Bhaskar DasGupta, Ming-Yang Kao, Jie Wang, On constructing an optimal consensus clustering from multiple clusterings Information Processing Letters. ,vol. 104, pp. 137- 145 ,(2007) , 10.1016/J.IPL.2007.06.008

Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu, Cecilia Procopiuc, Jong Soo Park, Fast algorithms for projected clustering international conference on management of data. ,vol. 28, pp. 61- 72 ,(1999) , 10.1145/304181.304188

Tian Zhang, Raghu Ramakrishnan, Miron Livny, BIRCH: an efficient data clustering method for very large databases international conference on management of data. ,vol. 25, pp. 103- 114 ,(1996) , 10.1145/233269.233324

10.

Charu C. Aggarwal, Philip S. Yu, Finding generalized projected clusters in high dimensional spaces international conference on management of data. ,vol. 29, pp. 70- 81 ,(2000) , 10.1145/335191.335383

PCS: An Efficient Clustering Method for High-Dimensional Data.

来源期刊

我的账户

PCS: An Efficient Clustering Method for High-Dimensional Data.

来源期刊

相似文章 2

Clustering Dynamic Class Coupling Data to Measure Class Reusability Pattern

Efficient data modeling and querying system for multi-dimensional spatial data

我的账户