High-Dimensional Continuous Control Using Generalized Advantage Estimation

John Schulman , Philipp Moritz , Sergey Levine , Michael Jordan
arXiv: Learning

2,460
2015
Gradient estimation using stochastic computation graphs

Pieter Abbeel , John Schulman , Nicolas Heess , Theophane Weber
neural information processing systems 28 3528 -3536

374
2015
Trust Region Policy Optimization

John Schulman ,
international conference on machine learning 1889 -1897

6,019
2015
Benchmarking Deep Reinforcement Learning for Continuous Control

Rein Houthooft , Pieter Abbeel , Yan Duan , John Schulman
arXiv: Learning

1,659
2016
Theano: A Python framework for fast computation of mathematical expressions

Rami Al-Rfou , Guillaume Alain , Amjad Almahairi , Christof Angermueller
arXiv: Symbolic Computation

868
2016
VIME: Variational Information Maximizing Exploration

Rein Houthooft , Pieter Abbeel , Filip De Turck , Yan Duan
arXiv: Learning

702
2016
Concrete Problems in AI Safety

Dario Amodei , Jacob Steinhardt , John Schulman , Chris Olah
arXiv: Artificial Intelligence

1,862
2016
Variational Lossy Autoencoder

Ilya Sutskever , Pieter Abbeel , Tim Salimans , Diederik P. Kingma
international conference on learning representations

651
2016
Equivalence Between Policy Gradients and Soft Q-Learning

Pieter Abbeel , John Schulman , Xi Chen
arXiv: Learning

255
2017
Proximal Policy Optimization Algorithms

John Schulman , Alec Radford , Prafulla Dhariwal , Filip Wolski
arXiv: Learning

10,510
2017
Meta Learning Shared Hierarchies

Pieter Abbeel , John Schulman , Jonathan Ho , Kevin Frans
arXiv: Learning

346
2017
UCB Exploration via Q-Ensembles

Pieter Abbeel , Richard Y. Chen , John Schulman , Szymon Sidor
arXiv: Learning

87
2017
On First-Order Meta-Learning Algorithms.

John Schulman , Joshua Achiam , Alex Nichol
arXiv: Learning

1,550
2018
Quantifying Generalization in Reinforcement Learning

John Schulman , Taehoon Kim , Oleg Klimov , Karl Cobbe
arXiv: Learning

476
2018
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

Tim Salimans , Robert Nishihara , Thomas Anthony , Philipp Moritz
arXiv: Learning

28
2019
The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors.

Ruslan Salakhutdinov , Sam Devlin , John Schulman , Sharada P. Mohanty
arXiv: Learning

21
2021
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Rein Houthooft , Pieter Abbeel , Filip De Turck , Yan Duan
arXiv: Artificial Intelligence

521
2016
Model-Based Reinforcement Learning via Meta-Policy Optimization

Pieter Abbeel , Tamim Asfour , John Schulman , Ignasi Clavera
arXiv: Learning

214
2018