Active Sampling for Constrained Clustering

作者: Masayuki Okabe , , Seiji Yamada ,

DOI: 10.20965/JACIII.2014.P0232

关键词: CURE data clustering algorithmCanopy clustering algorithmSingle-linkage clusteringConstrained clusteringCluster analysisFLAME clusteringConsensus clusteringCorrelation clusteringComputer scienceData mining

摘要: Constrained Clustering is a framework of improving clustering performance by using supervised information, which generally set constraints about data pairs. Since constrained depends on to use, we need method select good that are expected promote performance. In this paper, propose such method, actively pairs be variance iteration. This consists bagging based cluster ensemble algorithm integrates clusters produced k-means with random ordered assignment. Experimental results show our outperforms sampling method.

参考文章(13)
Semi-Supervised Learning Advanced Methods in Sequence Analysis Lectures. pp. 221- 232 ,(2010) , 10.7551/MITPRESS/9780262033589.001.0001
Steven C. H. Hoi, Rong Jin, Michael R. Lyu, Learning nonparametric kernel matrices from pairwise constraints international conference on machine learning. pp. 361- 368 ,(2007) , 10.1145/1273496.1273542
Yi Liu, Rong Jin, Anil K. Jain, BoostCluster Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07. pp. 450- 459 ,(2007) , 10.1145/1281192.1281242
William A. Gale, David D. Lewis, A sequential algorithm for training text classifiers international acm sigir conference on research and development in information retrieval. pp. 3- 12 ,(1994) , 10.5555/188490.188495
Zhenguo Li, Jianzhuang Liu, Xiaoou Tang, Pairwise constraint propagation by semidefinite programming for semi-supervised classification Proceedings of the 25th international conference on Machine learning - ICML '08. pp. 576- 583 ,(2008) , 10.1145/1390156.1390229
Inderjit S. Dhillon, Prateek Jain, Brian Kulis, Kristen Grauman, Online Metric Learning and Fast Similarity Search neural information processing systems. ,vol. 21, pp. 761- 768 ,(2008)
Lipika Dey, Anuj Mahajan, SK. Mirajul Haque, Document Clustering for Event Identification and Trend Analysis in Market News international conference on advances in pattern recognition. pp. 103- 106 ,(2009) , 10.1109/ICAPR.2009.84
Z. Wu, R. Leahy, An optimal graph theoretic approach to data clustering: theory and its application to image segmentation IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 15, pp. 1101- 1113 ,(1993) , 10.1109/34.244673
Claire Cardie, Kiri Wagstaff, Seth Rogers, Stefan Schrödl, Constrained K-means Clustering with Background Knowledge international conference on machine learning. pp. 577- 584 ,(2001)
Wei Tang, Hui Xiong, Shi Zhong, Jie Wu, Enhancing semi-supervised clustering Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07. pp. 707- 716 ,(2007) , 10.1145/1281192.1281268