作者: Md Alamgir Kabir , Jacky W. Keung , Kwabena E. Bennin , Miao Zhang
DOI: 10.1109/COMPSAC48688.2020.0-154
关键词:
摘要: In cross-version defect prediction (CVDP), historical data is derived from the prior version of same project to predict defects current version. Recent studies in CVDP focus on subset selection deal with changes distributions. No study has focused training arriving streaming fashion across versions where significant differences between make unreliable. We refer this situation as Drift Propensity (DP). By identifying DP, necessary steps can be taken (e.g., updating or retraining model) improve performance. paper, we investigate chronological datasets and identify DP datasets. The no-memory management technique employed manage distributions a detection proposed. idea behind proposed monitor algorithm's error-rate. detector triggers warning, control flags take steps. significantly superior distribution (p-value < 0.05). DP's identified achieve large effect sizes (Hedges' g ≥ 0.80) during pair-wise comparisons. observe that if error-rate exponentially increases, it causes resulting performance deterioration. thus recommend researches practitioners address Due its potential effects datasets, models could enhanced get best results CVDP.