DeFlaker: automatically detecting flaky tests

作者: Jonathan Bell , Owolabi Legunsen , Michael Hilton , Lamyaa Eloussi , Tifany Yung

DOI: 10.1145/3180155.3180164

关键词:

摘要: Developers often run tests to check that their latest changes a code repository did not break any previously working functionality. Ideally, new test failures would indicate regressions caused by the changes. However, some may be due but non-determinism in tests, popularly called flaky tests. The typical way detect is rerun failing repeatedly. Unfortunately, rerunning can costly and slow down development cycle.We present first extensive evaluation of propose technique, DeFlaker, detects if failure without with very low runtime overhead. DeFlaker monitors coverage marks as newly execute We deployed live, build process 96 Java projects on TravisCI, found 87 unknown 10 these projects. also ran experiments project histories, where detected 1, 874 from 4, 846 failures, false alarm rate (1.5%). had higher recall (95.5% vs. 23%) confirmed than Maven's default detector.

参考文章(54)
Emelie Engström, Per Runeson, Mats Skoglund, A systematic review on regression test selection techniques Information & Software Technology. ,vol. 52, pp. 14- 30 ,(2010) , 10.1016/J.INFSOF.2009.07.001
Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, David Notkin, Empirically revisiting the test independence assumption Proceedings of the 2014 International Symposium on Software Testing and Analysis - ISSTA 2014. pp. 385- 396 ,(2014) , 10.1145/2610384.2610404
Mustafa M. Tikir, Jeffrey K. Hollingsworth, Efficient instrumentation for code coverage testing ACM SIGSOFT Software Engineering Notes. ,vol. 27, pp. 86- 96 ,(2002) , 10.1145/566171.566186
Todd L. Graves, Mary Jean Harrold, Jung-Min Kim, Adam Porter, Gregg Rothermel, An empirical study of regression test selection techniques ACM Transactions on Software Engineering and Methodology. ,vol. 10, pp. 184- 208 ,(2001) , 10.1145/367008.367020
Arash Vahabzadeh, Amin Milani Fard, Ali Mesbah, An empirical study of bugs in test code international conference on software maintenance. pp. 101- 110 ,(2015) , 10.1109/ICSM.2015.7332456
Leandro Sales Pinto, Saurabh Sinha, Alessandro Orso, Understanding myths and realities of test-suite evolution Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE '12. pp. 33- ,(2012) , 10.1145/2393596.2393634
Wei Jin, Alessandro Orso, Tao Xie, Automated Behavioral Regression Testing international conference on software testing, verification, and validation. pp. 137- 146 ,(2010) , 10.1109/ICST.2010.64
Nanjuan Shi, Mary Jean Harrold, Scaling regression testing to large software systems foundations of software engineering. ,vol. 29, pp. 241- 251 ,(2004) , 10.1145/1029894.1029928
Swarnendu Biswas, Rajib Mall, Manoranjan Satpathy, Srihari Sukumaran, Regression Test Selection Techniques: A Survey Informatica (lithuanian Academy of Sciences). ,vol. 35, pp. 289- 321 ,(2011)
M.J. Harrold, D. Rosenblum, G. Rothermel, E. Weyuker, Empirical studies of a prediction model for regression test selection IEEE Transactions on Software Engineering. ,vol. 27, pp. 248- 263 ,(2001) , 10.1109/32.910860