作者: Michael M. Zavlanos , Yan Zhang
DOI:
关键词:
摘要: In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, assume that all agents keep local estimates of the global optimal policy parameter and update their value function independently. Then, introduce an additional consensus step let asymptotically achieve agreement on function. The convergence analysis proposed algorithm is provided effectiveness validated using resource allocation example. Compared relevant methods, here do not share information about tasks, but instead they coordinate estimate