作者: Kehan Gao , Taghi M. Khoshgoftaar , Amri Napolitano
关键词:
摘要: Software defect prediction can be considered a binary classification problem. Generally, practitioners utilize historical software data, including metric and fault data collected during the development process, to build model then employ this predict new program modules as either fault-prone (fp) or not-fault-prone (nfp). Limited project resources allocated according results by (for example) assigning more reviews testing predicted potentially defective. Two challenges often come with modeling process: (1) high-dimensionality of measurement (2) skewed imbalanced distributions between two types (fp nfp) in those datasets. To overcome these problems, extensive studies have been dedicated towards improving quality training data. The commonly used techniques are feature selection sampling. Usually, researchers focus on evaluating performance after is modified. present study assesses technique from different perspective. We interested studying stability method, especially understanding impact sampling when using sampled Some interesting findings found based case performed datasets real-world projects.