作者: Arthur Zimek , Davoud Moulavi , Ricardo J. G. B. Campello , Jörg Sander , Randy Goebel
关键词: Brown clustering 、 Correlation clustering 、 Computer science 、 Constrained clustering 、 Fuzzy clustering 、 Cluster analysis 、 Machine learning 、 Artificial intelligence 、 Determining the number of clusters in a data set 、 CURE data clustering algorithm 、 Data mining 、 Canopy clustering algorithm
摘要: Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \must-link" \cannot-link"), evaluation of approaches has rarely been discussed. The application cross-validation techniques, for example, far from straightforward in setting, yet problems associated with have to be addressed. Here we summarize these provide solution. Furthermore, order demonstrate practical applicability methods, method model selection based on this sound procedure. Our allows user select, available information (labels constraints), most appropriate (e.g., number clusters, density-parameters) given problem.