Using continuous action spaces to solve discrete problems

作者: Hado van Hasselt , Marco A. Wiering

DOI: 10.1109/IJCNN.2009.5178745

关键词:

摘要: Real-world control problems are often modeled as Markov Decision Processes (MDPs) with discrete action spaces to facilitate the use of many reinforcement learning algorithms that exist find solutions for such MDPs. For these an underlying continuous space can be assumed. We investigate performance Cacla algorithm, which uses a actor, on two MDPs: mountain car and cart pole. show has clear advantages over Q-learning Sarsa, even though its actions get rounded in same finite may contain only small number actions. In particular, we retains much better when is changed by removing some after time learning.

参考文章(15)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
M.A. Wiering, QV(lambda)-learning: A New On-policy Reinforcement Learning Algrithm european workshop on reinforcement learning. pp. 17- 18 ,(2005)
Anton Schwartz, A reinforcement learning method for maximizing undiscounted rewards international conference on machine learning. pp. 298- 305 ,(1993) , 10.1016/B978-1-55860-307-3.50045-9
Andrew G. Barto, Richard S. Sutton, Charles W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems systems man and cybernetics. ,vol. 13, pp. 834- 846 ,(1983) , 10.1109/TSMC.1983.6313077
Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences Machine Learning. ,vol. 3, pp. 9- 44 ,(1988) , 10.1023/A:1022633531479
Hado van Hasselt, Marco A. Wiering, Reinforcement Learning in Continuous Action Spaces 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. pp. 272- 279 ,(2007) , 10.1109/ADPRL.2007.368199
Sridhar Mahadevan, Average reward reinforcement learning: foundations, algorithms, and empirical results Machine Learning. ,vol. 22, pp. 159- 195 ,(1996) , 10.1007/BF00114727
Satinder P. Singh, Richard S. Sutton, Reinforcement learning with replacing eligibility traces Machine Learning. ,vol. 22, pp. 123- 158 ,(1996) , 10.1007/BF00114726
A.P. Wieland, Evolving neural network controllers for unstable systems international joint conference on neural network. pp. 667- 673 ,(1991) , 10.1109/IJCNN.1991.155416