Identifying and eliminating mislabeled training instances

作者: Carla E. Brodley , Mark A. Friedl

DOI:

关键词:

摘要: This paper presents a new approach to identifying and eliminating mislabeled training instances. The goal of this technique is improve classification accuracies produced by learning algorithms improving the quality data. employs an ensemble classifiers that serve as filter for Using n-fold cross validation, data passed through filter. Only instances classifies correctly are final algorithm. We present empirical evaluation task automated land cover mapping from remotely sensed Labeling error arises in these multitude sources including lack consistency vegetation used, variable measurement techniques, variation spatial sampling resolution. Our shows noise levels less than 40%, filtering results higher predictive accuracy not filtering, class or equal 20% allows base-line be retained. suggest effective method labeling errors, further, will significantly benefit ongoing research develop accurate robust remote sensing-based methods map at global scales.

参考文章(15)
David H. Wolpert, Original Contribution: Stacked generalization Neural Networks. ,vol. 5, pp. 241- 259 ,(1992) , 10.1016/S0893-6080(05)80023-1
Andrea Pohoreckyj Danyluk, Foster John Provost, Small Disjuncts in Action: Learning to Diagnose Errors in the Local Loop of the Telephone Network. international conference on machine learning. pp. 81- 88 ,(1993) , 10.1016/B978-1-55860-307-3.50017-4
David D. Lewis, Jason Catlett, Heterogeneous Uncertainty Sampling for Supervised Learning Machine Learning Proceedings 1994. pp. 148- 156 ,(1994) , 10.1016/B978-1-55860-335-6.50026-X
Carla E. Brodley, Paul E. Utgoff, Multivariate Decision Trees Machine Learning. ,vol. 19, pp. 45- 77 ,(1995) , 10.1023/A:1022607123649
Dennis L. Wilson, Asymptotic Properties of Nearest Neighbor Rules Using Edited Data systems man and cybernetics. ,vol. 2, pp. 408- 421 ,(1972) , 10.1109/TSMC.1972.4309137
Edwina L. Rissland, David B. Skalak, Inductive learning in a mixed paradigm setting national conference on artificial intelligence. pp. 840- 847 ,(1990)
L.K. Hansen, P. Salamon, Neural network ensembles IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 12, pp. 993- 1001 ,(1990) , 10.1109/34.58871
R. S. DEFRIES, J. R. G. TOWNSHEND, NDVI-derived land cover classifications at a global scale International Journal of Remote Sensing. ,vol. 15, pp. 3567- 3586 ,(1994) , 10.1080/01431169408954345
David W. Aha, Dennis Kibler, Marc K. Albert, Instance-Based Learning Algorithms Machine Learning. ,vol. 6, pp. 37- 66 ,(1991) , 10.1023/A:1022689900470