An empirical comparison of four initialization methods for the K-Means algorithm

作者: J.M Peña , J.A Lozano , P Larrañaga

DOI: 10.1016/S0167-8655(99)00069-0

关键词:

摘要: In this paper, we aim to compare empirically four initialization methods for the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although algorithm is known its robustness, it widely reported in literature that performance depends upon two key points: initial clustering instance order. We conduct a series of experiments draw up (in terms mean, maximum, minimum standard deviation) probability distribution square-error values final clusters returned by independently on any order when each used. The results our illustrate random Kaufman outperform rest compared as they make more effective independent addition, convergence speed using methods. Our suggest method induces desirable behaviour with respect than method.

参考文章(29)
Douglas Fisher, Ling Xu, Nazih Zard, Ordering effects in clustering international conference on machine learning. pp. 162- 168 ,(1992) , 10.1016/B978-1-55860-247-2.50026-7
John Stutz, Peter Cheeseman, Bayesian classification (AutoClass): theory and results knowledge discovery and data mining. pp. 153- 180 ,(1996)
D. Fisher, Iterative optimization and simplification of hierarchical clusterings Journal of Artificial Intelligence Research. ,vol. 4, pp. 147- 179 ,(1996) , 10.1613/JAIR.276
Christopher M. Bishop, Neural networks for pattern recognition ,(1995)
Josep Roure, Luis Talavera, Robust Incremental Clustering with Bad Instance Orderings: A New Strategy ibero american conference on ai. pp. 136- 147 ,(1998) , 10.1007/3-540-49795-1_12
Lawrence Davis, Applying adaptive algorithms to epistatic domains international joint conference on artificial intelligence. pp. 162- 164 ,(1985)
Usama M. Fayyad, Paul S. Bradley, Refining Initial Points for K-Means Clustering international conference on machine learning. pp. 91- 99 ,(1998)
Julius T. Tou, Rafael C. Gonzalez, Pattern recognition principles ,(1974)