TD(λ) Converges with Probability 1

关键词:

摘要: The methods of temporal differences (Samuel, 1959s Sutton, 1984, 1988) allow an agent to learn accurate predictions stationary stochastic future outcomes. learning is effectively approximation based on samples extracted from the process generating agent's future. Sutton (1988) proved that for a special case differences, expected values converge their correct values, as larger are taken, and Dayan (1992) extended his proof general case. This article proves stronger result than slightly modified form difference with probability one, shows how quantify rate convergence.

springer.com 本地加速

springer.com PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(15)

Christopher J.C.H. Watkins, Peter Dayan, Technical Note Q-Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1023/A:1022676722315

Pierre Priouret, Michel Métivier, Albert Benveniste, Adaptive Algorithms and Stochastic Approximations ,(1990)

Richard Stuart Sutton, Temporal credit assignment in reinforcement learning University of Massachusetts Amherst. ,(1984)

Herbert Robbins, Sutton Monro, A Stochastic Approximation Method Annals of Mathematical Statistics. ,vol. 22, pp. 400- 407 ,(1951) , 10.1214/AOMS/1177729586

Sheldon M. Ross, Introduction to Stochastic Dynamic Programming ,(2014)

Review of 'Approximation and Weak Convergence Methods for Random Processes, with Applications to Stochastic Systems Theory' (Kushner, H.J.; 1984) IEEE Transactions on Information Theory. ,vol. 31, pp. 841- 841 ,(1985) , 10.1109/TIT.1985.1057100

V. Nollau, Kushner, H. J./Clark, D. S., Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Applied Mathematical Sciences 26). Berlin-Heidelberg-New York, Springer-Verlag 1978. X, 261 S., 4 Abb., DM 26,40. US $ 13.20 ZAMM - Zeitschrift für Angewandte Mathematik und Mechanik. ,vol. 60, pp. 63- 64 ,(1980) , 10.1002/ZAMM.19800600133

Stuart Geman, Elie Bienenstock, René Doursat, Neural networks and the bias/variance dilemma Neural Computation. ,vol. 4, pp. 1- 58 ,(1992) , 10.1162/NECO.1992.4.1.1

Harold J. Kushner, Approximation and Weak Convergence Methods for Random Processes with Applications to Stochastic Systems Theory ,(1984)

10.

Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences Machine Learning. ,vol. 3, pp. 9- 44 ,(1988) , 10.1023/A:1022633531479

TD(λ) Converges with Probability 1

来源期刊

我的账户

TD(λ) Converges with Probability 1

来源期刊

相似文章 10

我的账户