作者: Yuan Tian , Nasir Ali , David Lo , Ahmed E. Hassan
DOI: 10.1007/S10664-015-9409-1
关键词: Machine learning 、 Software 、 Data quality 、 Artificial intelligence 、 Data mining 、 Software system 、 Software Problem 、 Reliability (statistics) 、 Computer science
摘要: Severity levels, e.g., critical and minor, of bugs are often used to prioritize development efforts. Prior research efforts have proposed approaches automatically assign the severity label a bug report. All prior verify accuracy their using human-assigned reports data that is stored in software repositories. However, all assume such reliable. Hence perfect automated approach should be able same as repository --- achieving 100% accuracy. Looking at duplicate (i.e., referring problem) from three open-source systems (OpenOffice, Mozilla, Eclipse), we find around 51 % inconsistent labels even though they refer problem. While our results do indicate unreliable labels, believe send warning signals about reliability full including non-duplicate reports). Future explore if findings generalize dataset. Moreover, factor nature data. Given data, classical metrics assess models/learners not for assessing assigning label. Hence, propose new performance models. Our assessment shows current perform well 77-86 agreement with labels.