Reinforcement Learning of Optimal Controls

作者： John K. Williams

DOI: 10.1007/978-1-4020-9119-3_15

关键词: Reinforcement learning algorithm 、 Learning agent 、 Reinforcement learning 、 Markov decision process 、 Artificial intelligence 、 Computer science

摘要:

springer.com 本地加速

springer.com 本地加速

springer.com 本地加速

sci-hub.se PDF 下载加速

参考文章(34)

Jennie Si, Andrew G Barto, Warren B Powell, Don Wunsch, Handbook of Learning and Approximate Dynamic Programming (2004). ,(2004) , 10.1109/9780470544785

Peter Dayan, Terrence J. Sejnowski, TD(λ) Converges with Probability 1 Machine Learning. ,vol. 14, pp. 295- 301 ,(1994) , 10.1023/A:1022657612745

Herbert Robbins, Sutton Monro, A Stochastic Approximation Method Annals of Mathematical Statistics. ,vol. 22, pp. 400- 407 ,(1951) , 10.1214/AOMS/1177729586

A. M. TURING, I.—COMPUTING MACHINERY AND INTELLIGENCE Mind. ,vol. 59, pp. 433- 460 ,(1950) , 10.1093/MIND/LIX.236.433

William S. Lovejoy, A survey of algorithmic methods for partially observed Markov decision processes Annals of Operations Research. ,vol. 28, pp. 47- 66 ,(1991) , 10.1007/BF02055574

David Atlas, Adaptively pointing spaceborne radar for precipitation measurements Journal of Applied Meteorology. ,vol. 21, pp. 429- 431 ,(1982) , 10.1175/1520-0450(1982)021<0429:APSRFP>2.0.CO;2

Dimitri P. Bertsekas, Dynamic Programming and Optimal Control Athena Scientific. ,(1995)

Dimitris Bertsimas, Sarah Stock Patterson, The Air Traffic Flow Management Problem with Enroute Capacities Operations Research. ,vol. 46, pp. 406- 422 ,(1998) , 10.1287/OPRE.46.3.406

Satinder P. Singh, Richard S. Sutton, Reinforcement learning with replacing eligibility traces Machine Learning. ,vol. 22, pp. 123- 158 ,(1996) , 10.1007/BF00114726

Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming ,(1994)