作者: Nathaniel Schenker , Jeremy M.G. Taylor
DOI: 10.1016/0167-9473(95)00057-7
关键词:
摘要: Abstract Multiple imputation is a technique for handling data sets with missing values. The method fills in the values several times, creating completed analysis. Each set analyzed separately using techniques designed complete data, and results are then combined such way that variability due to may be incorporated. Methods of imputing can vary from fully parametric nonparametric. In this paper, we compare partially regression-based multiple-imputation methods. consider imputes regression outcomes by drawing them their predictive distribution under model, whereas methods based on or residuals incomplete cases drawn cases. For methods, suggest new approach choosing which draw Monte Carlo study setting, investigate robustness schemes misspecification underlying model data. Sources considered include incorrect modeling mean structure as well specification error regard heaviness tails heteroscedasticity. compared respect bias efficiency point estimates coverage rates confidence intervals marginal function outcome. We find when specified correctly, all perform well, even if misspecified. approach, however, produces slightly more efficient outcome than do approaches. When misspecified, still estimating mean, although shows slight increases variance. function, breaks down situations, maintain good performance. an application AIDS research setting similar complicated study, examine how time infection HIV onset used impute residual subjects right-censored produce results, suggesting selection was adequate. Our provides example multiple combine information two cohorts estimate quantities cannot estimated directly either one separately.