作者: B. Twala , M. Cartwright , M. Shepperd
DOI: 10.1109/ISESE.2005.1541819
关键词:
摘要: Increasing the awareness of how missing data affects software predictive accuracy has led to increasing numbers techniques (MDTs). This paper investigates robustness and eight popular for tolerating incomplete training test using tree-based models. MDTs were compared by artificially simulating different proportions, patterns, mechanisms data. A 4-way repeated measures design was employed analyze The simulation results suggest important differences. Listwise deletion is substantially inferior while multiple imputation (MI) represents a superior approach handling Decision tree single surrogate variables splitting are more severely impacted values distributed among all attributes. MI should be used if contain many values. If few missing, any might considered. Choice technique guided pattern