Progressive sampling schemes for approximate clustering in very large data sets

作者: J.C. Bezdek , R.J. Hathaway

DOI: 10.1109/FUZZY.2004.1375677

关键词: Data miningFLAME clusteringFuzzy clusteringData stream clusteringCURE data clustering algorithmCanopy clustering algorithmFuzzy setCluster analysisMathematicsCorrelation clusteringAlgorithm

摘要: The extensible fast fuzzy c-means algorithm (eFFCM) finds clusters in very large digital images. eFFCM identifies a representative subsample of the image, which is then clustered using (FCM) algorithm. solution extended to secure an approximate clustering remaining pixels image. This article discusses generalized (geFFCM), extension general non-image data. Our accelerates literal (LFCM) on all (loadable) data sets. Second, geFFCM provides feasibility - way find (approximate) for sets that are too be loaded single computer. experiments suggest chi-squared or divergence test goodness fit alone good subsamples. new subsampling method should equally effective acceleration and with VL by any (not just FCM).

参考文章(4)
James C. Bezdek, Richard J. Hathaway, Convergence of alternating optimization Neural, Parallel & Scientific Computations archive. ,vol. 11, pp. 351- 368 ,(2003) , 10.5555/964885.964886
Tai Wai Cheng, D.B. Goldgof, L.O. Hall, Fast clustering with application to fuzzy rule generation ieee international conference on fuzzy systems. ,vol. 4, pp. 2289- 2295 ,(1995) , 10.1109/FUZZY.1995.409998
V. Ganti, J. Gehrke, R. Ramakrishnan, Mining very large databases Computer. ,vol. 32, pp. 38- 45 ,(1999) , 10.1109/2.781633
N.R. Pal, J.C. Bezdek, Complexity reduction for "large image" processing systems man and cybernetics. ,vol. 32, pp. 598- 611 ,(2002) , 10.1109/TSMCB.2002.1033179