作者: Veselin Raychev , Pavol Bielik , Martin Vechev , Andreas Krause
关键词:
摘要: We present a new approach for learning programs from noisy datasets. Our is based on two concepts: regularized program generator which produces candidate small sample of the entire dataset while avoiding overfitting, and sampler carefully samples by leveraging program's score that dataset. The components are connected in continuous feedback-directed loop. show how to apply this settings: one where has bound noise, another without noise bound. second setting leads way performing approximate empirical risk minimization hypotheses classes formed discrete search space. then kinds synthesizers target settings. First, we introduce novel bitstream synthesizer successfully generates even presence incorrect examples. can detect errors examples combating overfitting -- major problem existing synthesis techniques. also be used grows dynamically via (e.g., provided human). Second, technique constructing statistical code completion systems. These systems trained massive datasets open source programs, known as ``Big Code''. key idea domain specific language (DSL) over trees learn functions DSL directly learned condition predictions made system. This flexible powerful generalizes several works no longer need decide priori what prediction should conditioned (another benefit natural mechanism explaining prediction). As result, our system surpasses capabilities existing, hard-wired