Imitation-Projected Programmatic Reinforcement Learning.

作者: Yisong Yue , Swarat Chaudhuri , Hoang M. Le , Abhinav Verma

DOI:

关键词:

摘要: We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Programmatic policies can be more …

参考文章(57)
R. Bellman, I. Glicksberg, O. Gross, On the “bang-bang” control problem Quarterly of Applied Mathematics. ,vol. 14, pp. 11- 18 ,(1956) , 10.1090/QAM/78516
Sham Machandranath Kakade, On the Sample Complexity of Reinforcement Learning Doctoral thesis, UCL (University College London).. ,(2003)
J. Andrew Bagnell, Stéphane Ross, Reinforcement and Imitation Learning via Interactive No-Regret Learning arXiv: Learning. ,(2014)
Shai Shalev-Shwartz, Ambuj Tewari, Yoram Singer, John C. Duchi, Composite objective mirror descent conference on learning theory. pp. 14- 26 ,(2010)
Patrick L. Combettes, Heinz H. Bauschke, Convex Analysis and Monotone Operator Theory in Hilbert Spaces ,(2011)
Leonardo de Moura, Nikolaj Bjørner, Z3: an efficient SMT solver tools and algorithms for construction and analysis of systems. pp. 337- 340 ,(2008) , 10.1007/978-3-540-78800-3_24
John Schulman, None, Trust Region Policy Optimization international conference on machine learning. pp. 1889- 1897 ,(2015)
Roderick Bloem, Krishnendu Chatterjee, Thomas A. Henzinger, Barbara Jobstmann, Better Quality in Synthesis through Quantitative Objectives computer aided verification. pp. 140- 156 ,(2009) , 10.1007/978-3-642-02658-4_14
Hal Daume, Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Learning to Search Better than Your Teacher international conference on machine learning. pp. 2058- 2066 ,(2015)