Learning programs from noisy data

作者: Veselin Raychev , Pavol Bielik , Martin Vechev , Andreas Krause

DOI: 10.1145/2837614.2837671

关键词:

摘要: We present a new approach for learning programs from noisy datasets. Our is based on two concepts: regularized program generator which produces candidate small sample of the entire dataset while avoiding overfitting, and sampler carefully samples by leveraging program's score that dataset. The components are connected in continuous feedback-directed loop. show how to apply this settings: one where has bound noise, another without noise bound. second setting leads way performing approximate empirical risk minimization hypotheses classes formed discrete search space. then kinds synthesizers target settings. First, we introduce novel bitstream synthesizer successfully generates even presence incorrect examples. can detect errors examples combating overfitting -- major problem existing synthesis techniques. also be used grows dynamically via (e.g., provided human). Second, technique constructing statistical code completion systems. These systems trained massive datasets open source programs, known as ``Big Code''. key idea domain specific language (DSL) over trees learn functions DSL directly learned condition predictions made system. This flexible powerful generalizes several works no longer need decide priori what prediction should conditioned (another benefit natural mechanism explaining prediction). As result, our system surpasses capabilities existing, hard-wired

参考文章(36)
Leonardo de Moura, Nikolaj Bjørner, Z3: an efficient SMT solver tools and algorithms for construction and analysis of systems. pp. 337- 340 ,(2008) , 10.1007/978-3-540-78800-3_24
Nichael Lynn Cramer, A Representation for the Adaptive Generation of Simple Sequential Programs international conference on genetic algorithms. pp. 183- 187 ,(1985)
Henry S. Warren, Hacker's Delight ,(2002)
Daniel S. Weld, Tessa Ann Lau, Pedro Domingos, Programming by demonstration: a machine learning approach University of Washington. ,(2001)
Stephen Frederick Smith, A learning system based on genetic adaptive algorithms Ph. D. Thesis, University of Pittsburgh. ,(1980)
Ulrike von Luxburg, Bernhard Schölkopf, Statistical Learning Theory: Models, Concepts, and Results Handbook of the History of Logic. ,vol. 10, pp. 651- 706 ,(2011) , 10.1016/B978-0-444-52936-7.50016-1
James E. Baker, Reducing bias and inefficiency in the selection algorithm international conference on genetic algorithms. pp. 14- 21 ,(1987)
Sumit Gulwani, Dimensions in program synthesis principles and practice of declarative programming. pp. 13- 24 ,(2010) , 10.1145/1836089.1836091
Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, Tien N. Nguyen, A statistical semantic language model for source code foundations of software engineering. pp. 532- 542 ,(2013) , 10.1145/2491411.2491458
Swarat Chaudhuri, Martin Clochard, Armando Solar-Lezama, Bridging boolean and quantitative synthesis using smoothed proof search symposium on principles of programming languages. ,vol. 49, pp. 207- 220 ,(2014) , 10.1145/2535838.2535859