Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Richard S. Sutton , Kristopher De Asis , Silviu Pitis , Daniel Graves
arXiv: Learning

20
2019
Discounted Reinforcement Learning Is Not an Optimization Problem

Hengshuai Yao , Roshan Shariff , Niko Yasui , Abhishek Naik
arXiv: Artificial Intelligence

39
2019
Document-editing Assistants and Model-based Reinforcement Learning as a Path to Conversational AI

Patrick M. Pilarski , Richard S. Sutton , Katya Kudashkina
arXiv: Artificial Intelligence

5
2020
Reactive Reinforcement Learning in Asynchronous Environments

Kory Wallace Mathewson , Patrick M. Pilarski , Jaden B. Travnik , Richard S. Sutton
Frontiers in Robotics and AI 5 79

12
2018
Average-Reward Off-Policy Policy Evaluation with Function Approximation.

Shimon Whiteson , Richard S. Sutton , Shangtong Zhang , Yi Wan
arXiv: Learning

2
2021
Planning with Expectation Models for Control.

Richard S. Sutton , Yi Wan , Abhishek Naik , Katya Kudashkina
arXiv: Artificial Intelligence

2021
Reward Is Enough

Doina Precup , Satinder Singh , Richard S. Sutton , David Silver
Artificial Intelligence 103535

282
2021
Reinforcement Learning for 3 vs. 2 Keepaway

Peter Stone , Richard S. Sutton , Satinder Singh
robot soccer world cup 249 -258

28
2001
Open Theoretical Questions in Reinforcement Learning

Richard S. Sutton
european conference on computational learning theory 11 -17

81
1999
Reinforcement Learning is Direct Adaptive Optimal Control

Richard S. Sutton , Andrew G. Barto , Ronald J. Williams
american control conference ( 28) 2143 -2146

710
1991
Associative search network: A reinforcement learning associative memory

Andrew G. Barto , Richard S. Sutton , Peter S. Brouwer
Biological Cybernetics 40 ( 3) 201 -211

149
1981
Experiments with reinforcement learning in problems with continuous state and action spaces

Juan C. Santamaria , Richard S. Sutton , Ashwin Ram
Adaptive Behavior 6 ( 2) 163 -217

403
1998
Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

Vivek Veeriah , Shangtong Zhang , Richard S. Sutton
european conference on machine learning 445 -459

8
2017
Toward a modern theory of adaptive networks: Expectation and prediction.

Richard S. Sutton , Andrew G. Barto
Psychological Review 88 ( 2) 135 -170

1,697
1981
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

Richard S. Sutton , Doina Precup , Satinder Singh
Artificial Intelligence 112 ( 1-2) 181 -211

2,012
1999
Reinforcement Learning in Artificial Intelligence

Andrew G. Barto , Richard S. Sutton
Advances in psychology 121 358 -386

48
1997
Beyond reward: the problem of knowledge and data

Richard S. Sutton
inductive logic programming 2 -6

3
2011
Associative Learning from Replayed Experience

Elliot A. Ludvig , Mahdieh S. Mirian , E. James Kehoe , Richard S. Sutton
bioRxiv 100800

17
2017
Temporal-difference search in computer Go

David Silver , Richard S. Sutton , Martin Müller
Machine Learning 87 ( 2) 183 -219

137
2012