Structure in the Space of Value Functions

作者: David Foster , Peter Dayan

DOI: 10.1023/A:1017944732463

关键词:

摘要: Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing into its computationally elemental fragments. We suggest how to find fragmentations using unsupervised, mixture model, learning methods on data derived from value functions for multiple tasks, and show that these are accord with observable structure environments. Further, we present evidence such fragments can be of use a practical reinforcement context, by facilitating online, actor-critic goals MDPs.

参考文章(41)
Ken Currie, Austin Tate, O-Plan: The open planning architecture Artificial Intelligence. ,vol. 52, pp. 49- 86 ,(1991) , 10.1016/0004-3702(91)90024-E
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
C. Boutilier, T. Dean, S. Hanks, Decision-theoretic planning: structural assumptions and computational leverage Journal of Artificial Intelligence Research. ,vol. 11, pp. 1- 94 ,(1999) , 10.1613/JAIR.575
Jyrki Kivinen, Manfred K. Warmuth, EXPONENTIATED GRADIENT VERSUS GRADIENT DESCENT FOR LINEAR PREDICTORS Information & Computation. ,vol. 132, pp. 1- 63 ,(1997) , 10.1006/INCO.1996.2612
Andrew G. Barto, Richard S. Sutton, Charles W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems systems man and cybernetics. ,vol. 13, pp. 834- 846 ,(1983) , 10.1109/TSMC.1983.6313077
J.-P. Forestier, P. Varaiya, Multilayer control of large Markov chains IEEE Transactions on Automatic Control. ,vol. 23, pp. 298- 305 ,(1978) , 10.1109/TAC.1978.1101707
Thomas M. Cover, Joy A. Thomas, Elements of information theory ,(1991)
Geoffrey J. Gordon, Stable Fitted Reinforcement Learning neural information processing systems. pp. 1052- 1058 ,(1995)
D. Precup, S. Singh, R. S. Sutton, Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales University of Massachusetts. ,(1998)
Geoffrey E. Hinton, Richard S. Zemel, Autoencoders, Minimum Description Length and Helmholtz Free Energy neural information processing systems. ,vol. 6, pp. 3- 10 ,(1993)