A Generalized Clustering Method Based on Validity Indices and Membership Functions

作者: Edwin Aldana-Bobadilla , Ivan Lopez-Arevalo , Hiram Galeana-Zapien , Melesio Crespo-Sanchez

DOI: 10.1109/ACCESS.2018.2882408

关键词: Partition (database)Artificial intelligenceLinear programmingCluster analysisPattern recognitionGenetic algorithmComputer science

摘要: Clustering is an important task in data analysis to find a partition on unlabeled dataset based similarity relationships among its elements. Typically, such determined by proximity measure or distance. Then, the optimal one that minimizes distance elements belonging same subset and maximizes from different subsets. The way which found called clustering method. adequateness of commonly terms validity index. In this paper, we propose method referred as quality-driven search for (QDSOC) where process directly driven index instead measure. Our approach allows efficiently exploring large solution space via breed genetic algorithm, so-called eclectic algorithm. Unlike existing methods, proposed QDSOC offers provides mathematical model representation membership functions. This describes points belong subsets found. Thus, using model, can predict new objects without performing again. As part experimental evaluation, our compared with k-means self-organizing maps (SOMs), are two well-known approaches. methods were used solve wide sample problems, three indices. From obtained results, demonstrate statistically outperforms SOMs. We also point out does not incur excessive computational overhead respect traditional methods.

参考文章(60)
G. J. McLachlan, D. Peel, K. E. Basford, D. R. Greenway, Standard errors of fitted component means of normal mixtures Computational Statistics. ,vol. 12, pp. 1- 17 ,(1997)
David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)
Douglas Fisher, Improving inference through conceptual clustering national conference on artificial intelligence. pp. 461- 465 ,(1987)
Marc M. Van Hulle, Self-organizing maps Handbook of Natural Computing. pp. 585- 622 ,(2012)
S. P. Brooks, B. J. T. Morgan, Optimization Using Simulated Annealing The Statistician. ,vol. 44, pp. 241- 257 ,(1995) , 10.2307/2348448
Yan Ren, Lidong Wang, Wei Guan, Approaches to cluster validity index via mahalanobis metric chinese control and decision conference. pp. 6480- 6484 ,(2015) , 10.1109/CCDC.2015.7161986
David E. Goldberg, John H. Holland, Genetic Algorithms and Machine Learning Machine Learning. ,vol. 3, pp. 95- 99 ,(1988) , 10.1023/A:1022602019183
James Franklin, The elements of statistical learning : data mining, inference,and prediction The Mathematical Intelligencer. ,vol. 27, pp. 83- 85 ,(2005) , 10.1007/BF02985802
M. Birattari, T. Stutzle, M. Dorigo, Ant Colony Optimization ,(2004)