Increasing diversity: Natural language measures for software fault prediction

作者: David Binkley , Henry Feild , Dawn Lawrie , Maurizio Pighin

DOI: 10.1016/J.JSS.2009.06.036

关键词: Machine learningData miningUse Case PointsArtificial intelligenceSoftware reliability testingSoftware maintenanceSoftware constructionSoftware regressionSoftware metricSoftwareSoftware developmentSoftware qualitySoftware verification and validationComputer scienceSoftware sizingEmpirical process (process control model)Regression testing

摘要: While challenging, the ability to predict faulty modules of a program is valuable software project because it can reduce cost development, as well maintenance and evolution. Three language-processing based measures are introduced applied problem fault prediction. The first measure on usage natural language in program's identifiers. second concerns conciseness consistency third measure, referred QALP score, makes use techniques from information retrieval judge quality. score has been shown correlate with human judgments Two case studies consider processing applicability prediction using two programs (one open source, one proprietary). Linear mixed-effects regression models used identify relationships between defects measures. Results, while complex, show that improve prediction, especially when combination. Overall, explain one-third two-thirds faults studies. Consistent other uses processing, value three increases size module considered.

参考文章(32)
Micheal T. Longnecker, R. Lyman Ott, Introduction to Statistical Methods and Data Analysis (with CD-ROM) Duxbury Press. ,(2006)
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Beszé, Columbus - Reverse Engineering Tool and Schema for C++ international conference on software maintenance. pp. 172- ,(2002)
F. Deisenbock, M. Pizka, Concise and consistent naming [software system identifier naming] workshop on program comprehension. pp. 97- 106 ,(2005) , 10.1109/WPC.2005.14
Jon Bentley, Don Knuth, Programming pearls Communications of the ACM. ,vol. 29, pp. 384- 369 ,(1986) , 10.1145/5689.315644
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Christopher H. Morrell, Jay D. Pearson, Larry J. Brant, Linear Transformations of Linear Mixed-Effects Models The American Statistician. ,vol. 51, pp. 338- 343 ,(1997) , 10.1080/00031305.1997.10474409
Geert Molenberghs, Geert Verbeke, Linear Mixed Models for Longitudinal Data ,(2000)
Barry Boehm, Victor R Basili, None, Top 10 list [software development] IEEE Computer. ,vol. 34, pp. 135- 137 ,(2001) , 10.1109/2.962984