A consistency-based validation for data clustering

作者: Bing Zhu , Changzheng He , Xiaoyi Jiang

DOI: 10.3233/IDA-150727

关键词: Constrained clusteringArtificial intelligenceFuzzy clusteringDetermining the number of clusters in a data setCorrelation clusteringData miningCluster analysisComputer scienceCanopy clustering algorithmData stream clusteringCURE data clustering algorithmMachine learning

摘要: Clustering analysis is a powerful tool in customer segmentation. Although various algorithms have been proposed, the determination of optimal number clusters remains to be difficult issue. In this paper, clustering method based on consistency criterion proposed address The main characteristic new approach that it requires little prior information and can find automatically. Extensive comparisons are done over 22 real-world datasets from different domains, which four well-known combination with six indices used as benchmark methods. results demonstrate superiority our appropriately determining clusters. An application segmentation credit card users also illustrated.

参考文章(41)
R.F. Harrison, Y. Ding, Relational visual cluster validity Elsevier B.V.. ,(2007)
Alekseæi Grigoşevich Ivakhnenko, Hema R. Madala, Inductive Learning Algorithms for Complex Systems Modeling ,(1994)
L. Guerra, V. Robles, C. Bielza, P. Larrañaga, A comparison of clustering quality indices using outliers and noise intelligent data analysis. ,vol. 16, pp. 703- 715 ,(2012) , 10.3233/IDA-2012-0545
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Ravi Kothari, Dax Pitts, On finding the number of clusters Pattern Recognition Letters. ,vol. 20, pp. 405- 416 ,(1999) , 10.1016/S0167-8655(99)00008-2
Lawrence Hubert, James Schultz, Quadratic assignment as a general data analysis strategy. British Journal of Mathematical and Statistical Psychology. ,vol. 29, pp. 190- 241 ,(1976) , 10.1111/J.2044-8317.1976.TB00714.X
Yunfei Ding, Robert F. Harrison, Relational visual cluster validity (RVCV) Pattern Recognition Letters. ,vol. 28, pp. 2071- 2079 ,(2007) , 10.1016/J.PATREC.2007.06.002
Rui Xu, Jie Xu, D. C. Wunsch, A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering systems man and cybernetics. ,vol. 42, pp. 1243- 1256 ,(2012) , 10.1109/TSMCB.2012.2188509
Malay K. Pakhira, Sanghamitra Bandyopadhyay, Ujjwal Maulik, Validity index for crisp and fuzzy clusters Pattern Recognition. ,vol. 37, pp. 487- 501 ,(2004) , 10.1016/J.PATCOG.2003.06.005
Olatz Arbelaitz, Ibai Gurrutxaga, Javier Muguerza, Jesús M. Pérez, Iñigo Perona, An extensive comparative study of cluster validity indices Pattern Recognition. ,vol. 46, pp. 243- 256 ,(2013) , 10.1016/J.PATCOG.2012.07.021