Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

DOI:

关键词:

摘要: In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, assume that all agents keep local estimates of the global optimal policy parameter and update their value function independently. Then, introduce an additional consensus step let asymptotically achieve agreement on function. The convergence analysis proposed algorithm is provided effectiveness validated using resource allocation example. Compared relevant methods, here do not share information about tasks, but instead they coordinate estimate

arxiv.org 本地加速

arxiv.org PDF 下载加速

参考文章(25)

Hamid R. Maei, Richard S. Sutton, Shalabh Bhatnagar, Csaba Szepesv ri, Toward Off-Policy Learning Control with Function Approximation international conference on machine learning. pp. 719- 726 ,(2010)

Csaba Szepesvári, Hamid Reza Maei, Richard S. Sutton, A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation neural information processing systems. pp. 1609- 1616 ,(2008)

Vivek S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint ,(2008)

Leemon Baird, Residual Algorithms: Reinforcement Learning with Function Approximation Machine Learning Proceedings 1995. pp. 30- 37 ,(1995) , 10.1016/B978-1-55860-377-6.50013-X

Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)

Michel Benaïm, Josef Hofbauer, Sylvain Sorin, Stochastic Approximations and Differential Inclusions Siam Journal on Control and Optimization. ,vol. 44, pp. 328- 348 ,(2005) , 10.1137/S0363012904439301

Dimitri P. Bertsekas, Dynamic Programming and Optimal Control Athena Scientific. ,(1995)

P. Pennesi, I.C. Paschalidis, A Distributed Actor-Critic Algorithm and Applications to Mobile Sensor Network Coordination Problems IEEE Transactions on Automatic Control. ,vol. 55, pp. 492- 497 ,(2010) , 10.1109/TAC.2009.2037462

Sergio Valcarcel Macua, Jianshu Chen, Santiago Zazo, Ali H. Sayed, Distributed Policy Evaluation Under Multiple Behavior Strategies IEEE Transactions on Automatic Control. ,vol. 60, pp. 1260- 1274 ,(2015) , 10.1109/TAC.2014.2368731

10.

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

来源期刊

我的账户

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

来源期刊

相似文章 5

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms.

F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

Off-Policy Multi-Agent Decomposed Policy Gradients

RODE: Learning Roles to Decompose Multi-Agent Tasks

我的账户