A Drift Propensity Detection Technique to Improve the Performance for Cross-Version Software Defect Prediction

作者: Md Alamgir Kabir , Jacky W. Keung , Kwabena E. Bennin , Miao Zhang

DOI: 10.1109/COMPSAC48688.2020.0-154

关键词:

摘要: In cross-version defect prediction (CVDP), historical data is derived from the prior version of same project to predict defects current version. Recent studies in CVDP focus on subset selection deal with changes distributions. No study has focused training arriving streaming fashion across versions where significant differences between make unreliable. We refer this situation as Drift Propensity (DP). By identifying DP, necessary steps can be taken (e.g., updating or retraining model) improve performance. paper, we investigate chronological datasets and identify DP datasets. The no-memory management technique employed manage distributions a detection proposed. idea behind proposed monitor algorithm's error-rate. detector triggers warning, control flags take steps. significantly superior distribution (p-value < 0.05). DP's identified achieve large effect sizes (Hedges' g ≥ 0.80) during pair-wise comparisons. observe that if error-rate exponentially increases, it causes resulting performance deterioration. thus recommend researches practitioners address Due its potential effects datasets, models could enhanced get best results CVDP.

参考文章(40)
Leland Wilkinson, , Statistical Methods in Psychology Journals: Guidelines and Explanations American Psychologist. ,vol. 54, pp. 594- 604 ,(1999) , 10.1037/0003-066X.54.8.594
S. Lessmann, B. Baesens, C. Mues, S. Pietsch, Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings IEEE Transactions on Software Engineering. ,vol. 34, pp. 485- 496 ,(2008) , 10.1109/TSE.2008.35
Jayalath Ekanayake, Jonas Tappolet, Harald C. Gall, Abraham Bernstein, Tracking concept drift of software projects using defect prediction quality 2009 6th IEEE International Working Conference on Mining Software Repositories. pp. 51- 60 ,(2009) , 10.1109/MSR.2009.5069480
Vigdis By Kampenes, Tore Dybå, Jo E. Hannay, Dag I.K. Sjøberg, Systematic review: A systematic review of effect size in software engineering experiments Information & Software Technology. ,vol. 49, pp. 1073- 1086 ,(2007) , 10.1016/J.INFSOF.2007.02.015
J. Greenwald, T. Menzies, A. Frank, Data Mining Static Code Attributes to Learn Defect Predictors IEEE Transactions on Software Engineering. ,vol. 33, pp. 2- 13 ,(2007) , 10.1109/TSE.2007.10
Tore Dybå, Vigdis By Kampenes, Dag I.K. Sjøberg, A systematic review of statistical power in software engineering experiments Information & Software Technology. ,vol. 48, pp. 745- 755 ,(2006) , 10.1016/J.INFSOF.2005.08.009
Shane McIntosh, Ahmed E. Hassan, Baljinder Ghotra, Revisiting the impact of classification techniques on the performance of defect prediction models international conference on software engineering. ,vol. 1, pp. 789- 800 ,(2015) , 10.5555/2818754.2818850
Barbara A Kitchenham, Shari Lawrence Pfleeger, Lesley M Pickard, Peter W Jones, David C. Hoaglin, Khaled El Emam, Jarrett Rosenberg, None, Preliminary guidelines for empirical research in software engineering IEEE Transactions on Software Engineering. ,vol. 28, pp. 721- 734 ,(2002) , 10.1109/TSE.2002.1027796
V.R. Basili, F. Shull, F. Lanubile, Building knowledge through families of experiments IEEE Transactions on Software Engineering. ,vol. 25, pp. 456- 473 ,(1999) , 10.1109/32.799939
Xin Xia, David Lo, Sinno Jialin Pan, Nachiappan Nagappan, Xinyu Wang, HYDRA: Massively Compositional Model for Cross-Project Defect Prediction IEEE Transactions on Software Engineering. ,vol. 42, pp. 977- 998 ,(2016) , 10.1109/TSE.2016.2543218