Should we impute or should we weight? Examining the performance of two CART-based techniques for addressing missing data in small sample research with nonnormal variables

作者: Timothy Hayes , John J. McArdle

DOI: 10.1016/J.CSDA.2017.05.006

关键词:

摘要: Abstract Recently, researchers have proposed a variety of new methods for employing exploratory data mining algorithms to address missing data. Two promising classes take advantage classification and regression trees random forests. A first method uses the predicted probabilities response (vs. non-response) generated by CART analysis create inverse probability weights. This has been shown perform well in prior simulations when nonresponse was tree-based structures, even under low sample sizes. second values falling terminal nodes generate multiple imputations. In studies, these performed at estimating main effects interactions models sizes were large ( N = 1000 ), but their performance not evaluated small conditions. present research, we assess CART-based weights imputations 125 or 250) nonnormality are smooth functions (linear, quadratic, cubic, interactive). Results suggest that forest excel sizes, regardless nonnormality, whereas imputation is more efficient with larger samples 500 1000).