Comparative methods for handling missing data in large databases

作者: Antonia J. Henry , Nathanael D. Hevelone , Stuart Lipsitz , Louis L. Nguyen

DOI: 10.1016/J.JVS.2013.05.008

关键词:

摘要: Objective Analysis of complex survey databases is an important tool for health services researchers. Missing data elements are challenging because the reasons "missingness" multifactorial, especially categorical variables such as race. We simulated missing race and analyzed bias from five methods used in predicting major amputation patients with critical limb ischemia (CLI). Methods Patient discharges fully observed containing lower extremity revascularization or CLI were selected 2003 to 2007 Nationwide Inpatient Sample, a database (weighted n = 684,057). Considering several random schemes, we compared methods: complete case analysis, replacement frequencies, indicator variable, multiple imputation, reweighted estimating equations. created 100 sets, 5%, 15%, 30% subjects' drawn be full set. Bias was estimated by comparing regression coefficients averaged over sets (β miss ) each method vs estimates set ), relative calculated – β /β ) × 100%. Results Our results demonstrate that equations produce least biased variable produces most coefficients. Complete imputation resulted moderate bias. Sensitivity analysis demonstrated optimal choice depends on quantity type encountered. Conclusions analytic topic research large databases. The commonly introduces severe should caution. present empiric evidence guide selection handling data.

参考文章(10)
Donna D. McAlpine, Timothy J. Beebe, Michael Davern, Kathleen T. Call, Agreement between self-reported and administrative race and ethnicity data among Medicaid enrollees in Minnesota. Health Services Research. ,vol. 42, pp. 2373- 2388 ,(2007) , 10.1111/J.1475-6773.2007.00771.X
Lue Ping Zhao, Stuart Lipsitz, Danika Lew, Regression analysis with missing covariate data using estimating equations. Biometrics. ,vol. 52, pp. 1165- 1182 ,(1996) , 10.2307/2532833
James M. Robins, Andrea Rotnitzky, Lue Ping Zhao, Estimation of Regression Coefficients When Some Regressors are not Always Observed Journal of the American Statistical Association. ,vol. 89, pp. 846- 866 ,(1994) , 10.1080/01621459.1994.10476818
Roderick JA Little, None, Regression with MissingX's: A Review Journal of the American Statistical Association. ,vol. 87, pp. 1227- 1237 ,(1992) , 10.1080/01621459.1992.10476282
Antonia J. Henry, Nathanael D. Hevelone, Michael Belkin, Louis L. Nguyen, Socioeconomic and hospital-related predictors of amputation for critical limb ischemia Journal of Vascular Surgery. ,vol. 53, pp. 330- ,(2010) , 10.1016/J.JVS.2010.08.077
Charity G. Moore, Stuart R. Lipsitz, Cheryl L. Addy, James R. Hussey, Garrett Fitzmaurice, Sundar Natarajan, Logistic regression with incomplete covariate data in complex survey sampling: application of reweighted estimating equations. Epidemiology. ,vol. 20, pp. 382- 390 ,(2009) , 10.1097/EDE.0B013E318196CD65
John W. Graham, Missing data analysis: making it work in the real world. Annual Review of Psychology. ,vol. 60, pp. 549- 576 ,(2009) , 10.1146/ANNUREV.PSYCH.58.110405.085530
Yulei He, Missing Data Analysis Using Multiple Imputation Circulation: Cardiovascular Quality and Outcomes. ,vol. 3, pp. 98- 105 ,(2010) , 10.1161/CIRCOUTCOMES.109.875658
Donald B. Rubin, Multiple Imputation After 18+ Years Journal of the American Statistical Association. ,vol. 91, pp. 473- 489 ,(1996) , 10.2307/2291635