A unified framework for linear function approximation of value functions in stochastic control

作者: Matilde Sanchez-Fernandez , Santiago Zazo , Sergio Valcarcel

DOI: 10.5281/ZENODO.43656

关键词:

摘要: This paper contributes with a unified formulation that merges previous analysis on the prediction of performance (value function) certain sequence actions (policy) when an agent operates Markov decision process large state-space. When states are represented by features and value function is linearly approximated, our reveals new relationship between two common cost functions used to obtain optimal approximation. In addition, this allows us propose efficient adaptive algorithm provides unbiased linear estimate. The proposed illustrated simulation, showing competitive results compared state-of-the-art solutions.

参考文章(3)
Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. pp. 993- 1000 ,(2009) , 10.1145/1553374.1553501
Dimitri P. Bertsekas, Dynamic Programming and Optimal Control Athena Scientific. ,(1995)
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)