作者: Richard S. Sutton , Hamid Reza Maei , Doina Precup , Shalabh Bhatnagar , David Silver
关键词:
摘要: … In this section we derive two new algorithms as stochastic gradient descent in the projected Bellman error objective (5). We first establish some relationships between the relevant …