A temporal-difference model of classical conditioning

Richard S Sutton , Andrew G Barto
Smpte Journal 355 -378

264
1987
Model-based reinforcement learning with an approximate, learned model

Leonid Kuvayev , Richard S Sutton
Smpte Journal 101 -105

109
1996
Toward off-policy learning control with function approximation.

Hamid Reza Maei , Csaba Szepesvári , Shalabh Bhatnagar , Richard S Sutton
ICML 10 719 -726

333
2010
Machine Learning in a Dynamic World: Panel Discussion

Panos J Antsaklis , Kenneth A DeJong , Alan L Meyrowitz , Alexander M Meystel

1988
Reinforcement Learning of Local Shape in the Game of Go.

David Silver , Richard S Sutton , Martin Müller
IJCAI 7 1053 -1058

183
2007
Online off-policy prediction

Sina Ghiassian , Andrew Patterson , Martha White , Richard S Sutton
arXiv preprint arXiv:1811.02597

30
2018
Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return.

Craig Sherstan , Dylan R Ashley , Brendan Bennett , Kenny Young
UAI 63 -72

19
2018
Model-based reinforcement learning with non-linear expectation models and stochastic environments

Yi Wan , Zaheer Abbas , Martha White , Richard S Sutton
FAIM Workshop on Prediction and Generative Modeling in Reinforcement Learning 1 -5

6
2018
Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

Khurram Javed , Haseeb Shah , Richard S Sutton , Martha White
Journal of Machine Learning Research 24 1 -34

5
2023
Evaluating Predictive Knowledge

Alex Kearney , Anna Koop , Craig Sherstan , Johannes Gunther
AAAI Fall Symposium on Reasoning and Learning In Real-World Systems For Long-Term Autonomy

1
2018
Comparing policy-gradient algorithms

Richard S Sutton , Satinder Singh , David McAllester
IEEE Transactions on Systems, Man, and Cybernetics

31
2000
Hierarchical optimal control of MDPs

Amy McGovern , Doina Precup , Balaraman Ravindran , Satinder Singh
Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems 186 -191

35
1998
Off-policy temporal-difference learning with function approximation

Doina Precup , Richard S Sutton , Sanjoy Dasgupta
ICML 417 -424

462
2001
Intra-Option Learning about Temporally Abstract Actions.

Richard S Sutton , Doina Precup , Satinder Singh
ICML 98 556 -564

228
1998
Planning with closed-loop macro actions

Doina Precup , Richard S Sutton , Satinder P Singh
Working notes of the 1997 AAAI Fall Symposium on Model-directed Autonomous Systems 70 -76

44
1997
Exponentiated gradient methods for reinforcement learning

Doina Precup , Richard S Sutton
ICML 272 -277

31
1997
Multi-time models for reinforcement learning

Doina Precup , Richard S Sutton
Proceedings of the ICML’97 Workshop on Modelling in Reinforcement Learning

18
1997
NEURON-LIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS

Andrew G Barto , Richard S Sutton , Charles W Anderson
Behavioural Processes 9 ( 1)

4,653
1984
Introduction to reinforcement learning. Vol. 135

Richard S Sutton , Andrew G Barto
MIT press Cambridge 5 21 -22

1,099
1998