作者: Heidy-Marisol Marin-Castro , Miguel Morales-Sandoval , L Enrique Sucar , Eduardo F Morales
DOI:
关键词:
摘要: This paper introduces a semi-supervised ensemble of classifiers, called WSA (Weighted Semi-supervised AdaBoost). This ensemble can significantly improve the data classification data by exploiting the use of labeled and unlabeled data. WSA is based on Adaboost, a supervised ensemble algorithm, however, it also considers the unlabeled data during the training process. WSA works with a set of Naive Bayes base classifiers which are combined in a cascade-based technique as in AdaBoost. At each stage of WSA, the current classifier of the ensemble is trained using the classification results of labeled and unlabeled data obtained by the classifier at the previous stage. Then, classification is performed and the results are used for training the next classifier of the ensemble. Unlike other semi-supervised approaches, the unlabeled instances are weighted using a probabilistic measurement of the predicted labels by the current classifier. This reduces the strong bias that dubious classification of unlabeled data may produced on semi-supervised learning algorithms. Experimental results on different benchmark data sets show that this technique significantly increases the performance of a semi-supervised learning algorithm.