The impact of semi-supervised clustering on text classification

作者: Antonia Kyriakopoulou , Theodore Kalamboukis

DOI: 10.1145/2491845.2491866

关键词: Computer scienceMachine learningCURE data clustering algorithmBrown clusteringCorrelation clusteringPattern recognitionClustering high-dimensional dataConceptual clusteringCanopy clustering algorithmFuzzy clusteringCluster analysisArtificial intelligence

摘要: This paper addresses the problem of learning to classify texts by exploiting information derived from clustering both training and testing sets. The incorporation knowledge resulting into feature space representation is expected boost performance a classifier. Two different approaches are described, an unsupervised semi-supervised one. We present empirical study proposed algorithms on variety datasets. results encouraging, revealing that can create text classifiers high-accuracy.

参考文章(25)
George Karypis, CLUTO - A Clustering Toolkit Defense Technical Information Center. ,(2002) , 10.21236/ADA439508
Alexander Strehl, Joydeep Ghosh, A Scalable Approach to Balanced, High-Dimensional Clustering of Market-Baskets ieee international conference on high performance computing data and analytics. pp. 525- 536 ,(2000) , 10.1007/3-540-44467-X_48
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Hwanjo Yu, Jiong Yang, Jiawei Han, Classifying large data sets using SVMs with hierarchical clusters knowledge discovery and data mining. pp. 306- 315 ,(2003) , 10.1145/956750.956786
Glenn Fung, O. L. Mangasarian, Semi-superyised support vector machines for unlabeled data classification Optimization Methods & Software. ,vol. 15, pp. 29- 44 ,(2001) , 10.1080/10556780108805809
Bhavani Raskutti, Herman Ferrá, Adam Kowalczyk, Combining clustering and co-training to enhance text classification using unlabelled data Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02. pp. 620- 625 ,(2002) , 10.1145/775047.775139
Antonia Kyriakopoulou, Theodore Kalamboukis, Using clustering to enhance text classification international acm sigir conference on research and development in information retrieval. pp. 805- 806 ,(2007) , 10.1145/1277741.1277918
L. Douglas Baker, Andrew Kachites McCallum, Distributional clustering of words for text classification international acm sigir conference on research and development in information retrieval. pp. 96- 103 ,(1998) , 10.1145/290941.290970
Arindam Banerjee, Raymond J. Mooney, Sugato Basu, Semi-supervised Clustering by Seeding international conference on machine learning. pp. 27- 34 ,(2002)