Text Classification Aided by Clustering: a Literature Review

作者: Antonia Kyriakopoulou

DOI: 10.5772/6083

关键词:

摘要: Supervised and unsupervised learning have been the focus of critical research in areas machine artificial intelligence. In literature, these two streams flow independently each other, despite their close conceptual practical connections. this work we exclusively deal with text classification aided by clustering scenario. This chapter provides a review interpretation role different fields an eye towards identifying important research. Drawing upon literature analysis, discuss several issues surrounding tasks support tasks. We define problem, postulate number baseline methods, examine techniques used, classify them into meaningful categories. A standard issue for is creation compact representations feature space discovery complex relationships that exist between features, documents classes. There are approaches try to quantify notion information basic components problem. Given variables interest, sources about can be compressed while preserving information. Clustering one used context. vein, area where aid dimensionality reduction. as compression and/or extraction method: features clustered groups based on selected criteria. Feature methods create new, reduced-size event spaces joining similar groups. They similarity measure collapse single events no longer distinguish among constituent features. Typically, parameters cluster become weighted average its Two types studied: i) one-way clustering, i.e. distributions or classes, ii) coclustering, both documents. second has lot offer, semi-supervised learning. Training data contain labelled unlabelled examples. Obtaining fully training set difficult task; labelling usually done using human expertise, which expensive, time consuming, error prone. much easier since it involves collecting known belong

参考文章(56)
Daniel Boley, Dongwei Cao, Training Support Vector Machines Using Adaptive Clustering. siam international conference on data mining. pp. 126- 137 ,(2004) , 10.1137/1.9781611972740.12
John C. Platt, Fast training of support vector machines using sequential minimal optimization Advances in kernel methods. pp. 185- 208 ,(1999)
Jieyue He, Wei Zhong, Robert Harrison, Phang C. Tai, Yi Pan, Clustering support vector machines and its application to local protein tertiary structure prediction international conference on computational science. pp. 710- 717 ,(2006) , 10.1007/11758525_96
Ran El-Yaniv, Oren Souroujon, Iterative Double Clustering for Unsupervised and Semi-Supervised Learning neural information processing systems. pp. 1025- 1032 ,(2001) , 10.1007/3-540-44795-4_11
Ran El-Yaniv, Yoad Winter, Naftali Tishby, Ron Bekkerman, Distributional word clusters vs. words for text categorization Journal of Machine Learning Research. ,vol. 3, pp. 1183- 1208 ,(2003)
Sally A. Goldman, Yan Zhou, Enhancing Supervised Learning with Unlabeled Data international conference on machine learning. pp. 327- 334 ,(2000)
Mehran Sahami, Daphne Koller, Toward optimal feature selection international conference on machine learning. pp. 284- 292 ,(1996)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Hwanjo Yu, Jiong Yang, Jiawei Han, Classifying large data sets using SVMs with hierarchical clusters knowledge discovery and data mining. pp. 306- 315 ,(2003) , 10.1145/956750.956786