Reinforcement Learning of Optimal Controls

作者: John K. Williams

DOI: 10.1007/978-1-4020-9119-3_15

关键词: Reinforcement learning algorithmLearning agentReinforcement learningMarkov decision processArtificial intelligenceComputer science

摘要:

参考文章(34)
Jennie Si, Andrew G Barto, Warren B Powell, Don Wunsch, Handbook of Learning and Approximate Dynamic Programming (2004). ,(2004) , 10.1109/9780470544785
Peter Dayan, Terrence J. Sejnowski, TD(λ) Converges with Probability 1 Machine Learning. ,vol. 14, pp. 295- 301 ,(1994) , 10.1023/A:1022657612745
Herbert Robbins, Sutton Monro, A Stochastic Approximation Method Annals of Mathematical Statistics. ,vol. 22, pp. 400- 407 ,(1951) , 10.1214/AOMS/1177729586
A. M. TURING, I.—COMPUTING MACHINERY AND INTELLIGENCE Mind. ,vol. 59, pp. 433- 460 ,(1950) , 10.1093/MIND/LIX.236.433
William S. Lovejoy, A survey of algorithmic methods for partially observed Markov decision processes Annals of Operations Research. ,vol. 28, pp. 47- 66 ,(1991) , 10.1007/BF02055574
David Atlas, Adaptively pointing spaceborne radar for precipitation measurements Journal of Applied Meteorology. ,vol. 21, pp. 429- 431 ,(1982) , 10.1175/1520-0450(1982)021<0429:APSRFP>2.0.CO;2
Dimitri P. Bertsekas, Dynamic Programming and Optimal Control Athena Scientific. ,(1995)
Dimitris Bertsimas, Sarah Stock Patterson, The Air Traffic Flow Management Problem with Enroute Capacities Operations Research. ,vol. 46, pp. 406- 422 ,(1998) , 10.1287/OPRE.46.3.406
Satinder P. Singh, Richard S. Sutton, Reinforcement learning with replacing eligibility traces Machine Learning. ,vol. 22, pp. 123- 158 ,(1996) , 10.1007/BF00114726