Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

作者: Michael M. Zavlanos , Yan Zhang

DOI:

关键词:

摘要: In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, assume that all agents keep local estimates of the global optimal policy parameter and update their value function independently. Then, introduce an additional consensus step let asymptotically achieve agreement on function. The convergence analysis proposed algorithm is provided effectiveness validated using resource allocation example. Compared relevant methods, here do not share information about tasks, but instead they coordinate estimate

参考文章(25)
Hamid R. Maei, Richard S. Sutton, Shalabh Bhatnagar, Csaba Szepesv ri, Toward Off-Policy Learning Control with Function Approximation international conference on machine learning. pp. 719- 726 ,(2010)
Csaba Szepesvári, Hamid Reza Maei, Richard S. Sutton, A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation neural information processing systems. pp. 1609- 1616 ,(2008)
Leemon Baird, Residual Algorithms: Reinforcement Learning with Function Approximation Machine Learning Proceedings 1995. pp. 30- 37 ,(1995) , 10.1016/B978-1-55860-377-6.50013-X
Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)
Michel Benaïm, Josef Hofbauer, Sylvain Sorin, Stochastic Approximations and Differential Inclusions Siam Journal on Control and Optimization. ,vol. 44, pp. 328- 348 ,(2005) , 10.1137/S0363012904439301
Dimitri P. Bertsekas, Dynamic Programming and Optimal Control Athena Scientific. ,(1995)
P. Pennesi, I.C. Paschalidis, A Distributed Actor-Critic Algorithm and Applications to Mobile Sensor Network Coordination Problems IEEE Transactions on Automatic Control. ,vol. 55, pp. 492- 497 ,(2010) , 10.1109/TAC.2009.2037462
Sergio Valcarcel Macua, Jianshu Chen, Santiago Zazo, Ali H. Sayed, Distributed Policy Evaluation Under Multiple Behavior Strategies IEEE Transactions on Automatic Control. ,vol. 60, pp. 1260- 1274 ,(2015) , 10.1109/TAC.2014.2368731
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236