Evaluating the evaluations of code recommender systems: a reality check

作者: Sebastian Proksch , Sven Amann , Sarah Nadi , Mira Mezini

DOI: 10.1145/2970276.2970330

关键词: Recommender systemCode (cryptography)Data miningReality checkQuality (business)Context (language use)Computer scienceSoftwareInformation retrievalEmpirical research

摘要: While researchers develop many new exciting code recommender systems, such as method-call completion, code-snippet or search, an accurate evaluation of systems is always a challenge. We analyzed the current literature and found that most evaluations rely on artificial queries extracted from released code, which begs question: Do reflect real-life usages? To answer this question, we capture 6,189 fine-grained development histories real IDE interactions. use them ground truth extract 7,157 for specific system. compare results with different strategies check several assumptions are repeatedly used in research, but never empirically evaluated. find evolving context often observed practice has major effect prediction quality not commonly reflected evaluations.

参考文章(28)
Stas Negara, Mohsen Vakilian, Nicholas Chen, Ralph E. Johnson, Danny Dig, Is It Dangerous to Use Version Control Histories to Study Source Code Evolution? ECOOP 2012 – Object-Oriented Programming. pp. 79- 103 ,(2012) , 10.1007/978-3-642-31057-7_5
Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, Hong Mei, MAPO: Mining and Recommending API Usage Patterns european conference on object oriented programming. pp. 318- 343 ,(2009) , 10.1007/978-3-642-03013-0_15
Marcel Bruch, Martin Monperrus, Mira Mezini, Learning from examples to improve code completion systems foundations of software engineering. pp. 213- 222 ,(2009) , 10.1145/1595696.1595728
Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Andrian Marcus, Laura Moreno, How can I use this method international conference on software engineering. ,vol. 1, pp. 880- 890 ,(2015) , 10.5555/2818754.2818860
Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Michele Lanza, Mining StackOverflow to turn the IDE into a self-confident programming prompter mining software repositories. pp. 102- 111 ,(2014) , 10.1145/2597073.2597077
Carsten Kolassa, Dirk Riehle, Michel A. Salim, The empirical commit frequency distribution of open source projects international symposium on open collaboration. pp. 18- ,(2013) , 10.1145/2491055.2491073
Mik Kersten, Gail C. Murphy, Mylar: a degree-of-interest model for IDEs aspect-oriented software development. pp. 159- 168 ,(2005) , 10.1145/1052898.1052912
Romain Robbes, Michele Lanza, Improving code completion with program history automated software engineering. ,vol. 17, pp. 181- 212 ,(2010) , 10.1007/S10515-010-0064-X
Stas Negara, Mihai Codoban, Danny Dig, Ralph E. Johnson, Mining fine-grained code changes to detect unknown change patterns international conference on software engineering. pp. 803- 813 ,(2014) , 10.1145/2568225.2568317
Andrea Mocci, Michele Lanza, Roberto Minelli, The plague doctor: a promising cure for the window plague international conference on program comprehension. pp. 182- 185 ,(2015) , 10.5555/2820282.2820309