作者: Yu Jing , Song Wang
DOI:
关键词:
摘要: In this paper, we propose surrogate agent-environment interface (SAEI) in reinforcement learning. We also state that learning based on probability provides optimal policy of task interface. introduce action and develop the deterministic gradient (PSADPG) algorithm SAEI. This enables continuous control discrete action. The experiments show PSADPG achieves performance DQN certain tasks with stochastic nature initial training stage.