An evaluation of k-nearest neighbour imputation using Likert data

作者: C. Wohlin , P. Jonsson

DOI: 10.1109/METRICS.2004.10

关键词:

摘要: Studies in many different fields of research suffer from the problem missing data. With data, statistical tests will lose power, results may be biased, or analysis not feasible at all. There are several ways to handle problem, for example through imputation. imputation, values replaced with estimated according an imputation method model. In k-nearest neighbour (k-NN) method, a case is imputed using k most similar cases. this paper, we present evaluation k-NN Likert data software engineering context. We simulate and percentages Our findings indicate that it use suggest suitable value approximately square root number complete also show by relaxing rules respect selecting neighbours, ability remains high large amounts without affecting quality

参考文章(18)
Mingxiu Hu, Sameena M. Salvucci, EVALUATION OF SOME POPULAR IMPUTATION ALGORITHMS ,(2002)
Edith D. de Leeuw, Reducing missing data in surveys: an overview of methods Quality & Quantity. ,vol. 35, pp. 147- 160 ,(2001) , 10.1023/A:1010395805406
Günther Gediga, Ivo Düntsch, Maximum Consistency of Incomplete Datavia Non-Invasive Imputation Artificial Intelligence Review. ,vol. 19, pp. 93- 107 ,(2003) , 10.1023/A:1022188514489
Jiahua Chen, Jun Shao, Nearest Neighbor Imputation for Survey Data Journal of Official Statistics. ,vol. 16, pp. 113- 132 ,(2000)
Qinbao Song, Martin Shepperd, Michelle Cartwright, A Short Note on Safest Default Missingness Mechanism Assumptions Empirical Software Engineering. ,vol. 10, pp. 235- 243 ,(2005) , 10.1007/S10664-004-6193-8
M.H. Cartwright, M.J. Shepperd, Q. Song, Dealing with missing software project data ieee international software metrics symposium. pp. 154- 165 ,(2003) , 10.1109/METRIC.2003.1232464
Troyanskaya Olga, Cantor Michael, Shelock Gavin, Brown Pat, Hastie Trevor, Tibshirani Robert, Botstein David, None, Missing value estimation methods for DNA microarrays. Bioinformatics. ,vol. 17, pp. 520- 525 ,(2001) , 10.1093/BIOINFORMATICS/17.6.520
Ronald G. Downey, Craig V. King, Missing Data in Likert Ratings: A Comparison of Replacement Methods Journal of General Psychology. ,vol. 125, pp. 175- 191 ,(1998) , 10.1080/00221309809595542
J Engels, Imputation of missing longitudinal data: a comparison of methods. Journal of Clinical Epidemiology. ,vol. 56, pp. 968- 976 ,(2003) , 10.1016/S0895-4356(03)00170-7