Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

作者: Dongge Han , Wendelin Böhmer , Michael Wooldridge , Alex Rogers

DOI: 10.1007/978-3-030-29911-8_7

关键词: Reinforcement learningKey (cryptography)Set (psychology)Flexibility (engineering)Bellman equationPredictabilityDuration (project management)Computer scienceOperations researchOrder (business)

摘要: In a multi-agent system, an agent's optimal policy will typically depend on the policies chosen by others. Therefore, key issue in systems research is that of predicting behaviours others, and responding promptly to changes such behaviours. One obvious possibility for each agent broadcast their current intention, example, currently executed option hierarchical reinforcement learning framework. However, this approach results inflexibility agents if options have extended duration are dynamic. While adjusting at step improves flexibility from single-agent perspective, frequent can induce inconsistency between actual behaviour its intention. order balance predictability, we propose dynamic termination Bellman equation allows flexibly terminate options. We evaluate our model empirically set pursuit taxi tasks, show learn adapt across scenarios require different

参考文章(23)
Doina Precup, Satinder P. Singh, Richard S. Sutton, Intra-Option Learning about Temporally Abstract Actions international conference on machine learning. pp. 556- 564 ,(1998)
Michael N. Huhns, Munindar P. Singh, Readings in agents Morgan Kaufmann Publishers Inc.. ,(1997)
Victor Lesser, Milind Tambe, Charles L. Ortiz, Distributed Sensor Networks: A Multiagent Perspective Kluwer Academic Publishers. ,(2003)
Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)
Rajbala Makar, Sridhar Mahadevan, Mohammad Ghavamzadeh, Hierarchical multi-agent reinforcement learning Proceedings of the fifth international conference on Autonomous agents - AGENTS '01. pp. 246- 253 ,(2001) , 10.1145/375735.376302
Richard S. Sutton, Doina Precup, Satinder Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Artificial Intelligence. ,vol. 112, pp. 181- 211 ,(1999) , 10.1016/S0004-3702(99)00052-1
Nick R. Jennings, Commitments and conventions: The foundation of coordination in multi-agent systems Knowledge Engineering Review. ,vol. 8, pp. 223- 250 ,(1993) , 10.1017/S0269888900000205
T. G. Dietterich, Hierarchical reinforcement learning with the MAXQ value function decomposition Journal of Artificial Intelligence Research. ,vol. 13, pp. 227- 303 ,(2000) , 10.1613/JAIR.639
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
Gerald Tesauro, Temporal difference learning and TD-Gammon Communications of the ACM. ,vol. 38, pp. 58- 68 ,(1995) , 10.1145/203330.203343