An empirical analysis of data preprocessing for machine learning-based software cost estimation

作者: Jianglin Huang , Yan-Fu Li , Min Xie , None

DOI: 10.1016/J.INFSOF.2015.07.004

关键词:

摘要: ContextDue to the complex nature of software development process, traditional parametric models and statistical methods often appear be inadequate model increasingly complicated relationship between project cost features (or drivers). Machine learning (ML) methods, with several reported successful applications, have gained popularity for estimation in recent years. Data preprocessing has been claimed by many researchers as a fundamental stage ML methods; however, very few works focused on effects data techniques. ObjectiveThis study aims an empirical assessment effectiveness techniques context estimation. MethodIn this work, we first conduct literature survey publications using techniques, followed systematic analyze strengths weaknesses individual well their combinations. ResultsOur results indicate that may significantly influence final prediction. They sometimes might negative impacts prediction performance methods. ConclusionIn order reduce errors improve efficiency, careful selection is necessary according characteristics machine datasets used

参考文章(100)
Filomena Ferrucci, Mark Harman, Federica Sarro, Search-Based Software Project Management Software Project Management in a Changing World. pp. 373- 399 ,(2014) , 10.1007/978-3-642-55035-5_15
E. Mendes, B. Kitchenham, Further comparison of cross-company and within-company effort estimation models for Web applications ieee international software metrics symposium. pp. 348- 357 ,(2004) , 10.1109/METRICS.2004.24
Sanmay Das, Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection international conference on machine learning. pp. 74- 81 ,(2001)
L. Angelis, I. Stamelos, A Simulation Tool for Efficient Analogy Based Cost Estimation Empirical Software Engineering. ,vol. 5, pp. 35- 68 ,(2000) , 10.1023/A:1009897800559
Mohammad Azzeh, Daniel Neagu, Peter I. Cowling, Fuzzy grey relational analysis for software effort estimation Empirical Software Engineering. ,vol. 15, pp. 60- 90 ,(2010) , 10.1007/S10664-009-9113-0
Chao-Jung Hsu, Chin-Yu Huang, Comparison of weighted grey relational analysis for software effort estimation Software Quality Journal. ,vol. 19, pp. 165- 200 ,(2011) , 10.1007/S11219-010-9110-Y
Panagiotis Sentas, Lefteris Angelis, Categorical missing data imputation for software cost estimation by multinomial logistic regression Journal of Systems and Software. ,vol. 79, pp. 404- 414 ,(2006) , 10.1016/J.JSS.2005.02.026
Jingzhou Li, Guenther Ruhe, Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+ Empirical Software Engineering. ,vol. 13, pp. 63- 96 ,(2008) , 10.1007/S10664-007-9054-4