Clusterability Detection and Initial Seed Selection in Large Data Sets

作者: Mukkai Krishnamoorthy , Scott Epter , Mohammed Zaki

DOI:

关键词: Variety (cybernetics)Data setCluster analysisCluster (physics)Selection (genetic algorithm)Value (computer science)Computer sciencePerspective (graphical)Basis (linear algebra)Data mining

摘要: The need for a preliminary assessment of the clustering tendency or clusterability massive data sets is known. A good detection method should serve to in uence decision as whether cluster at all, well provide useful seed input chosen algorithm. We present framework de nition set from distance-based perspective. discuss graphbased system detecting and generating information including an estimate value k { number clusters set, parameter many methods. output our tunable accommodate wide variety have conducted experiments using methodology with stock market well-known BIRCH sets, two higher dimensions. Based on results we nd that can basis much future work this area. report promising directions.

参考文章(8)
Usama Fayyad, Cory Reina, P. S. Bradley, Scaling clustering algorithms to large databases knowledge discovery and data mining. pp. 9- 15 ,(1998)
Raymond T. Ng, Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining very large data bases. pp. 144- 155 ,(1994)
Usama M. Fayyad, Paul S. Bradley, Refining Initial Points for K-Means Clustering international conference on machine learning. pp. 91- 99 ,(1998)
Gholamhosein Sheikholeslami, Surojit Chatterjee, Aidong Zhang, WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases very large data bases. pp. 428- 439 ,(1998)
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Tian Zhang, Raghu Ramakrishnan, Miron Livny, BIRCH: an efficient data clustering method for very large databases international conference on management of data. ,vol. 25, pp. 103- 114 ,(1996) , 10.1145/233269.233324
Thomas T. Cormen, Ronald L. Rivest, Charles E. Leiserson, Introduction to Algorithms ,(1990)