Model selection for semi-supervised clustering

作者: Arthur Zimek , Davoud Moulavi , Ricardo J. G. B. Campello , Jörg Sander , Randy Goebel

DOI: 10.5441/002/EDBT.2014.31

关键词: Brown clusteringCorrelation clusteringComputer scienceConstrained clusteringFuzzy clusteringCluster analysisMachine learningArtificial intelligenceDetermining the number of clusters in a data setCURE data clustering algorithmData miningCanopy clustering algorithm

摘要: Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \must-link" \cannot-link"), evaluation of approaches has rarely been discussed. The application cross-validation techniques, for example, far from straightforward in setting, yet problems associated with have to be addressed. Here we summarize these provide solution. Furthermore, order demonstrate practical applicability methods, method model selection based on this sound procedure. Our allows user select, available information (labels constraints), most appropriate (e.g., number clusters, density-parameters) given problem.

参考文章(42)
P. Berkhin, A Survey of Clustering Data Mining Techniques Grouping Multidimensional Data. pp. 25- 71 ,(2006) , 10.1007/3-540-28349-8_2
Colin Campbell, Yiming Ying, Peng Li, A variational approach to semi-supervised clustering the european symposium on artificial neural networks. ,(2009)
Hans A. Kestler, Johann M. Kraus, Günther Palm, Friedhelm Schwenker, On the Effects of Constraints in Semi-supervised Hierarchical Clustering Artificial Neural Networks in Pattern Recognition. pp. 57- 66 ,(2006) , 10.1007/11829898_6
Sepandar D. Kamvar, Christopher D. Manning, Dan Klein, From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering international conference on machine learning. pp. 307- 314 ,(2002)
Martin H. C. Law, Alexander Topchy, Anil K. Jain, Clustering with Soft and Group Constraints Lecture Notes in Computer Science. pp. 662- 670 ,(2004) , 10.1007/978-3-540-27868-9_72
Carlos Ruiz, Myra Spiliopoulou, Ernestina Menasalvas, C-DBSCAN: Density-Based Clustering with Constraints granular computing. pp. 216- 223 ,(2009) , 10.1007/978-3-540-72530-5_25
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Glenn W. Milligan, Martha C. Cooper, An examination of procedures for determining the number of clusters in a data set Psychometrika. ,vol. 50, pp. 159- 179 ,(1985) , 10.1007/BF02294245
Christian Böhm, Claudia Plant, HISSCLU Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08. pp. 440- 451 ,(2008) , 10.1145/1353343.1353398
Carlos Ruiz, Myra Spiliopoulou, Ernestina Menasalvas, Density-based semi-supervised clustering Data Mining and Knowledge Discovery. ,vol. 21, pp. 345- 370 ,(2010) , 10.1007/S10618-009-0157-Y