Improving semi-supervised support vector machines through unlabeled instances selection

作者: Yu-Feng Li , Zhi-Hua Zhou

DOI:

关键词:

摘要: Semi-supervised support vector machines (S3VMs) are a kind of popular approaches which try to improve learning performance by exploiting unlabeled data. Though S3VMs have been found helpful in many situations, they may degenerate and the resultant generalization ability be even worse than using labeled data only. In this paper, we reduce chance degeneration S3VMs. Our basic idea is that, rather all data, instances should selected such that only ones very likely exploited, while some highly risky avoided. We propose S3VM-us method hierarchical clustering select instances. Experiments on broad range sets over eighty-eight different settings show much smaller existing

参考文章(29)
R. Cowell Z. Ghahramani, A Zien, O Chapelle, Semi-Supervised Classification by Low Density Separation international conference on artificial intelligence and statistics. pp. 57- 64 ,(2005)
Semi-Supervised Learning Advanced Methods in Sequence Analysis Lectures. pp. 221- 232 ,(2010) , 10.7551/MITPRESS/9780262033589.001.0001
Ming Li, Zhi-Hua Zhou, SETRED: Self-training with Editing Advances in Knowledge Discovery and Data Mining. pp. 611- 621 ,(2005) , 10.1007/11430919_71
Brian Pantano, John D. Lafferty, Mugizi Robert Rwebangira, Xiaojin Zhu, Avrim Blum, Patrick Pakyan Choi, Maria-Florina Balcan, Person Identification in Webcam Images: An Application of Semi-Supervised Learning Carnegie Mellon University. ,(2005) , 10.1184/R1/6608360.V1
Avrim Blum, Shuchi Chawla, Learning from Labeled and Unlabeled Data using Graph Mincuts international conference on machine learning. pp. 19- 26 ,(2001) , 10.1184/R1/6606860.V1
Dávid Pál, Shai Ben-David, Tyler Lu, Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning. conference on learning theory. pp. 33- 44 ,(2008)
Wei Wang, Zhi-Hua Zhou, Analyzing Co-training Style Algorithms european conference on machine learning. pp. 454- 465 ,(2007) , 10.1007/978-3-540-74958-5_42
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Avrim Blum, Tom Mitchell, None, Combining labeled and unlabeled data with co-training conference on learning theory. pp. 92- 100 ,(1998) , 10.1145/279943.279962
Zhi-Hua Zhou, Ming Li, Semi-supervised learning by disagreement Knowledge and Information Systems. ,vol. 24, pp. 415- 439 ,(2010) , 10.1007/S10115-009-0209-Z