Predicting Program Properties from "Big Code"

作者: Veselin Raychev , Martin Vechev , Andreas Krause

DOI: 10.1145/2676726.2677009

关键词:

摘要: We present a new approach for predicting program properties from massive codebases (aka "Big Code"). Our first learns probabilistic model existing data and then uses this to predict of new, unseen programs. The key idea our work is transform the input into representation which allows us phrase problem inferring as structured prediction in machine learning. This formulation enables leverage powerful graphical models such conditional random fields (CRFs) order perform joint properties. As an example approach, we built scalable engine called JSNice solving two kinds problems context JavaScript: (syntactic) names identifiers (semantic) type annotations variables. Experimentally, predicts correct 63% name its annotation predictions are 81% cases. In week since release, was used by more than 30,000 developers only few months has become popular tool JavaScript developer community. By formulating showing how both learning inference context, opens up possibilities attacking wide range difficult Code" including invariant generation, decompilation, synthesis others.

参考文章(27)
Simon Holm Jensen, Anders Møller, Peter Thiemann, None, Type Analysis for JavaScript static analysis symposium. pp. 238- 255 ,(2009) , 10.1007/978-3-642-03237-0_17
J. Andrew Bagnell, Martin A. Zinkevich, Nathan D. Ratliff, Approximate) Subgradient Methods for Structured Prediction international conference on artificial intelligence and statistics. pp. 380- 387 ,(2007)
Nir Friedman, Daniel L. Koller, Probabilistic graphical models : principles and techniques The MIT Press. ,(2009)
David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu, Statistical Debugging Using Latent Topic Models european conference on machine learning. pp. 6- 17 ,(2007) , 10.1007/978-3-540-74958-5_5
Julian Besag, On the statistical analysis of dirty pictures Journal of the royal statistical society series b-methodological. ,vol. 48, pp. 259- 279 ,(1986) , 10.1111/J.2517-6161.1986.TB01412.X
Ted Kremenek, Andrew Y. Ng, Dawson Engler, A factor graph model for software bug finding international joint conference on artificial intelligence. pp. 2510- 2516 ,(2007)
Ted Kremenek, Paul Twohey, Andrew Ng, Godmar Back, Dawson Engler, From uncertainty to belief: inferring the specification within operating systems design and implementation. pp. 161- 176 ,(2006) , 10.5555/1298455.1298471
David Pinto, Andrew McCallum, Xing Wei, W. Bruce Croft, Table extraction using conditional random fields international acm sigir conference on research and development in information retrieval. pp. 235- 242 ,(2003) , 10.1145/860435.860479