Cluster validation in k-Means clustering of mixed databases based on principal component analysis

作者: Ryoichi Nonoguchi , Katsuhiro Honda , Akira Notsu , Hidetomo Ichihashi

DOI: 10.1109/SCIS-ISIS.2012.6505102

关键词: DatabaseArtificial intelligencek-medians clusteringComputer scienceRotation (mathematics)Cluster analysisSingle-linkage clusteringRelation (database)Data miningCategorical variablek-means clusteringPrincipal component analysisPattern recognition

摘要: Considering the close relation between k-Means clustering and principal component analysis (PCA), a cluster validation approach for partitions was proposed using analytical solutions of PCA. In this paper, is further extended handling mixed databases composed not only numerical observations but also categorical observations. new databases, PCA are given by considering optimal scaling category observations, plausibility evaluated calculating deviations from after Procrustean rotation.

参考文章(13)
Katsuhiro Honda, , Ryo Uesugi, Hidetomo Ichihashi, FCM-Type Fuzzy Clustering of Mixed Databases Considering Nominal Variable Quantification Journal of Advanced Computational Intelligence and Intelligent Informatics. ,vol. 11, pp. 162- 167 ,(2007) , 10.20965/JACIII.2007.P0162
K. Honda, A. Notsu, T. Matsui, H. Ichihashi, Fuzzy Cluster Validation Based on Fuzzy PCA-Guided Procedure International Journal of Fuzzy System Applications archive. ,vol. 1, pp. 49- 60 ,(2011) , 10.4018/IJFSA.2011010104
Forrest W. Young, Jan de Leeuw, Yoshio Takane, Regression with qualitative and quantitative variables: An alternating least squares method with optimal scaling features Psychometrika. ,vol. 41, pp. 505- 529 ,(1976) , 10.1007/BF02296972
J. C. Dunn†, Well-Separated Clusters and Optimal Fuzzy Partitions Journal of Cybernetics. ,vol. 4, pp. 95- 104 ,(1974) , 10.1080/01969727408546059
Chris Ding, Xiaofeng He, Linearized cluster assignment via spectral ordering Twenty-first international conference on Machine learning - ICML '04. pp. 30- ,(2004) , 10.1145/1015330.1015407
Chris Ding, Xiaofeng He, K-means clustering via principal component analysis Twenty-first international conference on Machine learning - ICML '04. pp. 29- ,(2004) , 10.1145/1015330.1015408
Rajesh N Dave, Characterization and detection of noise in clustering Pattern Recognition Letters. ,vol. 12, pp. 657- 664 ,(1991) , 10.1016/0167-8655(91)90002-4
J. B. Macqueen, Some methods for classification and analysis of multivariate observations Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. ,vol. 1, pp. 281- 297 ,(1967)