Reinforcement Learning Methods for Continuous-Time Markov Decision Problems

作者: Steven J. Bradtke , Michael O. Duff

DOI:

关键词:

摘要: … After reviewing semi-Markov Decision Problems and Bellman's … the solution of semi-Markov Decision Problems. We demonstrate … sample discount on the value of the next state given a …

参考文章(11)
Steven Joseph Bradtke, Incremental dynamic programming for on-line adaptive optimal control University of Massachusetts. ,(1995)
J.N. Tsitsiklis, Asynchronous stochastic approximation and Q-learning conference on decision and control. pp. 395- 400 ,(1993) , 10.1109/CDC.1993.325119
Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh, Learning to act using real-time dynamic programming Artificial Intelligence. ,vol. 72, pp. 81- 138 ,(1995) , 10.1016/0004-3702(94)00011-O
Eric V. Denardo, CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING Siam Review. ,vol. 9, pp. 165- 177 ,(1967) , 10.1137/1009030
Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences Machine Learning. ,vol. 3, pp. 9- 44 ,(1988) , 10.1023/A:1022633531479
B. Hajek, Optimal control of two interacting service stations IEEE Transactions on Automatic Control. ,vol. 29, pp. 491- 499 ,(1984) , 10.1109/TAC.1984.1103577
Tommi Jaakkola, Michael Jordan, Satinder Singh, None, Convergence of Stochastic Iterative Dynamic Programming Algorithms neural information processing systems. ,vol. 6, pp. 703- 710 ,(1993) , 10.1162/NECO.1994.6.6.1185
C. Darken, J. Chang, J. Moody, Learning rate schedules for faster stochastic gradient search Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop. pp. 3- 12 ,(1992) , 10.1109/NNSP.1992.253713