作者: Matilde Sanchez-Fernandez , Santiago Zazo , Sergio Valcarcel
DOI: 10.5281/ZENODO.43656
关键词:
摘要: This paper contributes with a unified formulation that merges previous analysis on the prediction of performance (value function) certain sequence actions (policy) when an agent operates Markov decision process large state-space. When states are represented by features and value function is linearly approximated, our reveals new relationship between two common cost functions used to obtain optimal approximation. In addition, this allows us propose efficient adaptive algorithm provides unbiased linear estimate. The proposed illustrated simulation, showing competitive results compared state-of-the-art solutions.