作者: Hado van Hasselt , Marco A. Wiering
DOI: 10.1109/IJCNN.2009.5178745
关键词:
摘要: Real-world control problems are often modeled as Markov Decision Processes (MDPs) with discrete action spaces to facilitate the use of many reinforcement learning algorithms that exist find solutions for such MDPs. For these an underlying continuous space can be assumed. We investigate performance Cacla algorithm, which uses a actor, on two MDPs: mountain car and cart pole. show has clear advantages over Q-learning Sarsa, even though its actions get rounded in same finite may contain only small number actions. In particular, we retains much better when is changed by removing some after time learning.