作者: Cindy X. Chen , Jie Wang , Wei Li
DOI:
关键词: Data mining 、 CURE data clustering algorithm 、 Clustering high-dimensional data 、 Data stream clustering 、 Correlation clustering 、 Computer science 、 Consensus clustering 、 Cluster analysis 、 Determining the number of clusters in a data set 、 Canopy clustering algorithm
摘要: Clustering algorithms play an important role in data analysis and information retrieval. How to obtain a clustering for large set of highdimensional suitable database applications remains challenge. We devise this paper set-theoretic method called PCS (Pairwise Consensus Scheme) high-dimensional data. Given d-dimensional data, first constructs ( d p ) clusterings, where ≤ is small number (e.g., = 2 or 3) each constructed on projected combination selected dimensions using existing p-dimensional algorithm. then constructs, greedy pairwise comparison technique based recent algorithm [1], near-optimal consensus from these clusterings be the final original set. show that incurs only moderate I/O cost, memory requirement independent size. Finally, we carry out numerical experiments demonstrate efficiency PCS.