Model-Free reinforcement learning with continuous action in practice

作者: T. Degris , P. M. Pilarski , R. S. Sutton

DOI: 10.1109/ACC.2012.6315022

关键词:

摘要: Reinforcement learning methods are often considered as a potential solution to enable a robot to adapt to changes in real time to an unpredictable environment. However, with continuous action, only a few existing algorithms are practical for real-time learning. In such a setting, most effective methods have used a parameterized policy structure, often with a separate parameterized value function. The goal of this paper is to assess such actor-critic methods to form a fully specified practical algorithm. Our specific contributions include 1) …

参考文章(16)
Richard S. Sutton, Steven D. Whitehead, Online learning with random representations international conference on machine learning. pp. 314- 321 ,(1993) , 10.1016/B978-1-55860-307-3.50047-2
Manuela Veloso, Michael Bowling, Simultaneous adversarial multi-robot learning international joint conference on artificial intelligence. pp. 699- 704 ,(2003)
H. Kimura, T. Yamashita, S. Kobayashi, Reinforcement learning of walking behavior for a four-legged robot conference on decision and control. ,vol. 1, pp. 411- 416 ,(2001) , 10.1109/CDC.2001.980135
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee, Natural actor-critic algorithms Automatica. ,vol. 45, pp. 2471- 2482 ,(2009) , 10.1016/J.AUTOMATICA.2009.07.008
P. M. Pilarski, M. R. Dawson, T. Degris, F. Fahimi, J. P. Carey, R. S. Sutton, Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning ieee international conference on rehabilitation robotics. ,vol. 2011, pp. 1- 7 ,(2011) , 10.1109/ICORR.2011.5975338
H. Benbrahim, J.S. Doleac, J.A. Franklin, O.G. Selfridge, Real-time learning: a ball on a beam international joint conference on neural network. ,vol. 1, pp. 98- 103 ,(1992) , 10.1109/IJCNN.1992.287219
R. Tedrake, T.W. Zhang, H.S. Seung, Stochastic policy gradient reinforcement learning on a simple 3D biped intelligent robots and systems. ,vol. 3, pp. 2849- 2854 ,(2004) , 10.1109/IROS.2004.1389841
Kenji Doya, Reinforcement Learning in Continuous Time and Space Neural Computation. ,vol. 12, pp. 219- 245 ,(2000) , 10.1162/089976600300015961