Comparison of various methods for handling incomplete data in software engineering databases

作者: B. Twala , M. Cartwright , M. Shepperd

DOI: 10.1109/ISESE.2005.1541819

关键词:

摘要: Increasing the awareness of how missing data affects software predictive accuracy has led to increasing numbers techniques (MDTs). This paper investigates robustness and eight popular for tolerating incomplete training test using tree-based models. MDTs were compared by artificially simulating different proportions, patterns, mechanisms data. A 4-way repeated measures design was employed analyze The simulation results suggest important differences. Listwise deletion is substantially inferior while multiple imputation (MI) represents a superior approach handling Decision tree single surrogate variables splitting are more severely impacted values distributed among all attributes. MI should be used if contain many values. If few missing, any might considered. Choice technique guided pattern

参考文章(29)
Matthew Evett, Edward Allen, Pei-der Chien, Taghi Khoshgoftar, GP-based software quality prediction ,(1998)
C. Wohlin, P. Jonsson, An evaluation of k-nearest neighbour imputation using Likert data ieee international software metrics symposium. pp. 108- 118 ,(2004) , 10.1109/METRICS.2004.10
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Kamakshi Lakshminarayan, Steven A. Harp, Tariq Samad, Imputation of Missing Data in Industrial Databases Applied Intelligence. ,vol. 11, pp. 259- 275 ,(1999) , 10.1023/A:1008334909089
James Dougherty, Ron Kohavi, Mehran Sahami, Supervised and Unsupervised Discretization of Continuous Features Machine Learning Proceedings 1995. pp. 194- 202 ,(1995) , 10.1016/B978-1-55860-377-6.50032-3
Gustavo E. A. P. A. Batista, Maria Carolina Monard, An analysis of four missing data treatment methods for supervised learning Applied Artificial Intelligence. ,vol. 17, pp. 519- 533 ,(2003) , 10.1080/713827181
Joseph L. Schafer, Maren K. Olsen, Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective Multivariate Behavioral Research. ,vol. 33, pp. 545- 571 ,(1998) , 10.1207/S15327906MBR3304_5
Roderick JA Little, Donald B Rubin, None, Statistical Analysis with Missing Data ,(1987)