Genetic Algorithm for Finding Cluster Hierarchies

作者: Christian Böhm , Annahita Oswald , Christian Richter , Bianca Wackersreuther , Peter Wackersreuther

DOI: 10.1007/978-3-642-23088-2_26

关键词:

摘要: Hierarchical clustering algorithms have been studied extensively in the last years. However, existing approaches for hierarchical suffer from several drawbacks. The representation of results is often hard to interpret even large datasets. Many are not robust noise objects or overcome these limitation only by difficult parameter settings. As many heavily depend on their initialization, resulting get stuck a local optimum. In this paper, we propose novel geneticbased algorithm GACH (Genetic Algorithm finding Cluster Hierarchies) that solves those problems beneficial combination genetic algorithms, information theory and model-based clustering. capable find correct number model parameters using Minimum Description Length (MDL) principle does initialization use population-based stochastic search which ensures thorough exploration space. Moreover, outliers handled as they assigned appropriate inner nodes hierarchy root. An extensive evaluation synthetic well real data demonstrates superiority our over approaches.

参考文章(27)
Peter Grünwald, A Tutorial Introduction to the Minimum Description Length Principle arXiv: Statistics Theory. ,(2004)
Dan Pelleg, Andrew W. Moore, X-means: Extending K-means with Efficient Estimation of the Number of Clusters international conference on machine learning. pp. 727- 734 ,(2000)
Christian Böhm, Frank Fiedler, Annahita Oswald, Claudia Plant, Bianca Wackersreuther, Peter Wackersreuther, ITCH: information-theoretic cluster hierarchies european conference on machine learning. pp. 151- 167 ,(2010) , 10.1007/978-3-642-15880-3_16
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
D Whitley, T Starkweather, C Bogart, Genetic algorithms and neural networks: optimizing connections and connectivity parallel computing. ,vol. 14, pp. 347- 361 ,(1990) , 10.1016/0167-8191(90)90086-O
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Christian Böhm, Christos Faloutsos, Jia-Yu Pan, Claudia Plant, Robust information-theoretic clustering knowledge discovery and data mining. pp. 65- 75 ,(2006) , 10.1145/1150402.1150414
P. Scheunders, A comparison of clustering algorithms applied to color image quantization Pattern Recognition Letters. ,vol. 18, pp. 1379- 1384 ,(1997) , 10.1016/S0167-8655(97)00116-5
Sankar K. Pal, Dinabandhu Bhandari, Malay K. Kundu, Genetic algorithms for optimal image enhancement Pattern Recognition Letters. ,vol. 15, pp. 261- 271 ,(1994) , 10.1016/0167-8655(94)90058-2