Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

作者： Dongge Han , Wendelin Böhmer , Michael Wooldridge , Alex Rogers

关键词: Reinforcement learning 、 Key (cryptography) 、 Set (psychology) 、 Flexibility (engineering) 、 Bellman equation 、 Predictability 、 Duration (project management) 、 Computer science 、 Operations research 、 Order (business)

摘要: In a multi-agent system, an agent's optimal policy will typically depend on the policies chosen by others. Therefore, key issue in systems research is that of predicting behaviours others, and responding promptly to changes such behaviours. One obvious possibility for each agent broadcast their current intention, example, currently executed option hierarchical reinforcement learning framework. However, this approach results inflexibility agents if options have extended duration are dynamic. While adjusting at step improves flexibility from single-agent perspective, frequent can induce inconsistency between actual behaviour its intention. order balance predictability, we propose dynamic termination Bellman equation allows flexibly terminate options. We evaluate our model empirically set pursuit taxi tasks, show learn adapt across scenarios require different

arxiv.org 本地加速

arxiv.org PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(23)

Doina Precup, Satinder P. Singh, Richard S. Sutton, Intra-Option Learning about Temporally Abstract Actions international conference on machine learning. pp. 556- 564 ,(1998)

Michael N. Huhns, Munindar P. Singh, Readings in agents Morgan Kaufmann Publishers Inc.. ,(1997)

Victor Lesser, Milind Tambe, Charles L. Ortiz, Distributed Sensor Networks: A Multiagent Perspective Kluwer Academic Publishers. ,(2003)

Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)

Rajbala Makar, Sridhar Mahadevan, Mohammad Ghavamzadeh, Hierarchical multi-agent reinforcement learning Proceedings of the fifth international conference on Autonomous agents - AGENTS '01. pp. 246- 253 ,(2001) , 10.1145/375735.376302

Richard S. Sutton, Doina Precup, Satinder Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Artificial Intelligence. ,vol. 112, pp. 181- 211 ,(1999) , 10.1016/S0004-3702(99)00052-1

Nick R. Jennings, Commitments and conventions: The foundation of coordination in multi-agent systems Knowledge Engineering Review. ,vol. 8, pp. 223- 250 ,(1993) , 10.1017/S0269888900000205

T. G. Dietterich, Hierarchical reinforcement learning with the MAXQ value function decomposition Journal of Artificial Intelligence Research. ,vol. 13, pp. 227- 303 ,(2000) , 10.1613/JAIR.639

A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)

10.

Gerald Tesauro, Temporal difference learning and TD-Gammon Communications of the ACM. ,vol. 38, pp. 58- 68 ,(1995) , 10.1145/203330.203343

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

来源期刊

我的账户

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

来源期刊

相似文章 0

我的账户