作者: Xue Zhang , Wang-xin Xiao
DOI: 10.1109/ICSAI.2012.6223496
关键词:
摘要: Clustering aided classification methods are based on the assumption that learned clusters under guidance of initial training data can somewhat characterize underlying distribution set. However, our experiments show whether such holds is both separability considered set and size It often violated bad separability, especially when too few. In this case, clustering would perform worse. paper, we propose a two-stage text approach to address above problem. first stage, labeled unlabeled clustered with data. Then self-training style strategy used iteratively expand an oracle or expert. At second discriminative classifiers subsequently be trained expanded Unlike other methods, proposed effectively cope separability. Furthermore, framework converts problem sparsely into supervised one, therefore, models, e.g. SVM, applied, techniques for learning further improve accuracy, as feature selection, sampling editing noise filtering. Our experimental results demonstrated effectiveness very small.