Region-based crossover for clustering problems

作者: Michael J. Laszlo , Jeevan D'Souza

DOI:

关键词: Data miningMathematicsCentroidCrossoverEuclidean distanceMedoidGenetic algorithmCluster analysisPopulationData point

摘要: Data clustering, which partitions data points into clusters, has many useful applications in economics, science and engineering. clustering algorithms can be partitional or hierarchical. The k-means algorithm is the most widely used because of its simplicity efficiency. One problem with that quality produced highly dependent on initial selection centers. This been tackled using genetic (GA) where a set centers encoded an individual population solutions are generated evolutionary operators such as crossover, mutation selection. Of GA methods, region-based (RBGA) proven to effective technique when centroid was representative object cluster (ROC) Euclidean distance metric. The RBGA uses crossover operator exchanges subsets belong region space rather than exchanging random rationale occupy given tend serve building blocks. Exchanging preserves propagates high-quality partial solutions. This research aims at assessing variety ROCs metrics. tested along other four benchmark datasets metrics, varied number centers, centroids medoids ROCs. results obtained showed superior performance across all sets parameters, indicating may prove strategy broad range problems.

参考文章(20)
Jay N. Bhuyan, Vijay V. Raghavan, Venkatesh K. Elayavalli, Genetic Algorithm for Clustering with an Ordered Representation. ICGA. pp. 408- 415 ,(1991)
Donald R. Jones, Mark A. Beltramo, Solving Partitioning Problems with Genetic Algorithms. ICGA. pp. 442- 449 ,(1991)
Elena Deza, Michel-Marie Deza, Dictionary of distances ,(2006)
Gerd Stumme, Bernhard Ganter, Rudolf Wille, Formal Concept Analysis: foundations and applications Springer-Verlag. ,(2005)
Sung-Hyon Myaeng, Bashar Al-Shboul, Initializing K-Means using Genetic Algorithms World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering. ,vol. 3, pp. 1481- 1485 ,(2009)
Laurie J Heyer, Semyon Kruglyak, Shibu Yooseph, Exploring Expression Data: Identification and Analysis of Coexpressed Genes Genome Research. ,vol. 9, pp. 1106- 1115 ,(1999) , 10.1101/GR.9.11.1106
A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review ACM Computing Surveys. ,vol. 31, pp. 264- 323 ,(1999) , 10.1145/331499.331504
V. Estivill-Castro, J. Yang, Fast and Robust General Purpose Clustering Algorithms Data Mining and Knowledge Discovery. ,vol. 8, pp. 127- 150 ,(2004) , 10.1023/B:DAMI.0000015869.08323.B3
Sanghamitra Bandyopadhyay, Ujjwal Maulik, An evolutionary technique based on K-means algorithm for optimal clustering in R N Information Sciences. ,vol. 146, pp. 221- 237 ,(2002) , 10.1016/S0020-0255(02)00208-6
Michael Laszlo, Sumitra Mukherjee, A genetic algorithm that exchanges neighboring centers for fuzzy c-means clustering Pattern Recognition Letters. ,vol. 28, pp. 2359- 2366 ,(2007) , 10.1016/J.PATREC.2007.08.006