Handling Missing Values when Applying Classification Models

作者: Maytal Saar-Tsechansky , Foster Provost

DOI:

关键词:

摘要: Much work has studied the effect of different treatments missing values on model induction, but little analyzed for common case at prediction time. This paper first compares several methods---predictive value imputation, distribution-based imputation used by C4.5, and using reduced models---for applying classification trees to instances with (and also shows evidence that results generalize bagged logistic regression). The show two most popular treatments, each is preferable under conditions. Strikingly reduced-models approach, seldom mentioned or used, consistently outperforms other methods, sometimes a large margin. lack attention modeling may be due in part its (perceived) expense terms computation storage. Therefore, we then introduce evaluate alternative, hybrid approaches allow users balance between more accurate computationally expensive other, less treatments. methods can scale gracefully amount investment computation/storage, they outperform even small investments.

参考文章(35)
Zoubin Ghahramani, Michael I Jordan, None, Mixture models for learning from incomplete data conference on learning theory. pp. 67- 85 ,(1997)
EH Herskovits, GF Cooper, None, Algorithms for Bayesian belief-network precomputation. Methods of Information in Medicine. ,vol. 30, pp. 81- 89 ,(1991) , 10.1055/S-0038-1634820
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)
Steven L. Salzberg, Alberto Segre, Programs for Machine Learning ,(1994)
J.R. Quinlan, Unknown attribute values in induction international conference on machine learning. pp. 164- 168 ,(1989) , 10.1016/B978-1-55860-036-2.50048-5
Dale Schuurmans, Russell Greiner, Learning to classify incomplete examples conference on learning theory. pp. 87- 105 ,(1997)
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Nir Friedman, Moises Goldszmidt, Learning Bayesian networks with local structure uncertainty in artificial intelligence. pp. 252- 262 ,(1996) , 10.1007/978-94-011-5014-9_15
David Heckerman, David Maxwell Chickering, Christopher Meek, Robert Rounthwaite, Carl Kadie, Dependency networks for inference, collaborative filtering, and data visualization Journal of Machine Learning Research. ,vol. 1, pp. 49- 75 ,(2001) , 10.1162/153244301753344614