Relevance Vector Sampling for Reinforcement Learning in Continuous Action Space

作者: Minwoo Lee , Charles W Anderson , None

DOI: 10.1109/ICMLA.2016.0138

关键词: Computer scienceSampling (statistics)Relevance (information retrieval)Reinforcement learningSurface (mathematics)Function approximationAction (philosophy)Kernel (linear algebra)Space (mathematics)Artificial intelligence

摘要: To be applicable to real world problems, much reinforcement learning (RL) research has focused on continuous state spaces with function approximations. Some problems also require actions, but searching for good actions in a action space is problematic. This paper suggests novel relevance vector sampling approach search an RL framework machines (RVM-RL). We hypothesize that each (RV) placed the modes of value approximation surface as converges. From hypothesis, we select RVs maximize estimated state-action values. report efficiency proposed by controlling simulated octopus arm RV-sampled actions.

参考文章(28)
José del R. Millán, Daniele Posenato, Eric Dedieu, Continuous-Action Q-Learning Machine Learning. ,vol. 49, pp. 247- 265 ,(2002) , 10.1023/A:1017988514716
Richard Dearden, Nir Friedman, Stuart Russell, Bayesian Q-learning national conference on artificial intelligence. pp. 761- 768 ,(1998)
Charles W Anderson, Minwoo Lee, Daniel L Elliott, None, Faster reinforcement learning after pretraining deep networks to predict state dynamics international joint conference on neural network. pp. 1- 7 ,(2015) , 10.1109/IJCNN.2015.7280824
Michael E. Tipping, Anita C. Faul, Fast Marginal Likelihood Maximisation for Sparse Bayesian Models. international conference on artificial intelligence and statistics. ,(2003)
Minwoo Lee, Charles W Anderson, None, Convergent reinforcement learning control with neural networks and continuous action search ieee symposium on adaptive dynamic programming and reinforcement learning. pp. 1- 8 ,(2014) , 10.1109/ADPRL.2014.7010612
Hamid Benbrahim, Judy A. Franklin, Biped dynamic walking using reinforcement learning Robotics and Autonomous Systems. ,vol. 22, pp. 283- 302 ,(1996) , 10.1016/S0921-8890(97)00043-2
Yoram Yekutieli, Roni Sagiv-Zohar, Ranit Aharonov, Yaakov Engel, Binyamin Hochner, Tamar Flash, Dynamic Model of the Octopus Arm. I. Biomechanics of the Octopus Reaching Movement Journal of Neurophysiology. ,vol. 94, pp. 1443- 1458 ,(2005) , 10.1152/JN.00684.2004
Michael G. Madden, Tom Howley, Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty Artificial Intelligence Review. ,vol. 21, pp. 375- 398 ,(2004) , 10.1023/B:AIRE.0000036264.95672.64
Fernando Fernández, Manuela Veloso, Probabilistic policy reuse in a reinforcement learning agent Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems - AAMAS '06. pp. 720- 727 ,(2006) , 10.1145/1160633.1160762