Dealing with missing software project data

作者: M.H. Cartwright , M.J. Shepperd , Q. Song

DOI: 10.1109/METRIC.2003.1232464

关键词:

摘要: Whilst there is a general consensus that quantitative approaches are an important part of successful software project management, has been relatively little research into many the obstacles to data collection and analysis in real world. One feature characterises sets we deal with missing or highly questionable values. Naturally this problem not unique engineering, so explore application two existing imputation techniques have used good effect elsewhere. In order assess potential value use industrial sets. Both quite problematic from effort modelling perspective because they contain few cases, significant number values projects heterogeneous. We examine quality fit models derived by stepwise regression on raw imputed various compared. both find k-nearest neighbour (k-NN) sample mean (SMI) significantly improve model fit, k-NN giving best results. These results consistent other recently published results, consequently conclude can assist empirical engineering.

参考文章(18)
Michael H. Kutner, John Neter, William Wasserman, Applied linear statistical models : regression, analysis of variance, and experimental designs Published in <b>1990</b> in Burr Ridge Ill) by Irwin. ,(1974)
S. D. Conte, H. E. Dunsmore, V. Y. Shen, Software engineering metrics and models Benjamin-Cummings Publishing Co., Inc.. ,(1986)
Gada Kadoda, Michelle Cartwright, Martin Shepperd, Issues on the Effective Use of CBR Technology for Software Project Prediction international conference on case based reasoning. pp. 276- 290 ,(2001) , 10.1007/3-540-44593-5_20
Ronald Gulezian, Reformulating and calibrating COCOMO Journal of Systems and Software. ,vol. 16, pp. 235- 242 ,(1991) , 10.1016/0164-1212(91)90018-2
Joseph L. Schafer, Maren K. Olsen, Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective Multivariate Behavioral Research. ,vol. 33, pp. 545- 571 ,(1998) , 10.1207/S15327906MBR3304_5
Roderick JA Little, Donald B Rubin, None, Statistical Analysis with Missing Data ,(1987)
M.C. Paulk, B. Curtis, M.B. Chrissis, C.V. Weber, Capability maturity model, version 1.1 IEEE Software. ,vol. 10, pp. 18- 27 ,(1993) , 10.1109/52.219617
Roderick J. A. Little, A Test of Missing Completely at Random for Multivariate Data with Missing Values Journal of the American Statistical Association. ,vol. 83, pp. 1198- 1202 ,(1988) , 10.1080/01621459.1988.10478722
Troyanskaya Olga, Cantor Michael, Shelock Gavin, Brown Pat, Hastie Trevor, Tibshirani Robert, Botstein David, None, Missing value estimation methods for DNA microarrays. Bioinformatics. ,vol. 17, pp. 520- 525 ,(2001) , 10.1093/BIOINFORMATICS/17.6.520