Multi-Agent Hierarchical Reinforcement Learning with Dynamic Termination

作者: Alex Rogers , Michael Wooldridge , Wendelin Boehmer , Dongge Han

DOI: 10.1007/978-3-030-29911-8_7

关键词: Perspective (graphical)Bellman equationFlexibility (engineering)Key (cryptography)Order (business)Computer scienceRisk analysis (engineering)Reinforcement learningDuration (project management)

摘要: In a multi-agent system, an agent's optimal policy will typically depend on the policies of other agents. Predicting behaviours others, and responding promptly to changes in such behaviours, is therefore key issue systems research. One obvious possibility for each agent broadcast their current intention, example, currently executed option hierarchical RL framework. However, this approach results inflexible agents when options have extended duration. While adjusting at step improves flexibility from single-agent perspective, frequent can induce inconsistency between actual behaviour its broadcasted intention. order balance predictability, we propose dynamic termination Bellman equation that allows flexibly terminate options.

参考文章(23)
Doina Precup, Satinder P. Singh, Richard S. Sutton, Intra-Option Learning about Temporally Abstract Actions international conference on machine learning. pp. 556- 564 ,(1998)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
Christopher J.C.H. Watkins, Peter Dayan, Technical Note Q-Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1023/A:1022676722315
Michael J. Wooldridge, Michael Woolridge, Introduction to Multiagent Systems John Wiley & Sons, Inc.. ,(2001)
Victor Lesser, Milind Tambe, Charles L. Ortiz, Distributed Sensor Networks: A Multiagent Perspective Kluwer Academic Publishers. ,(2003)
Ming Tan, Multi-agent reinforcement learning: independent vs. cooperative agents international conference on machine learning. pp. 487- 494 ,(1997) , 10.1016/B978-1-55860-307-3.50049-6
Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)
Rajbala Makar, Sridhar Mahadevan, Mohammad Ghavamzadeh, Hierarchical multi-agent reinforcement learning Proceedings of the fifth international conference on Autonomous agents - AGENTS '01. pp. 246- 253 ,(2001) , 10.1145/375735.376302
Richard S. Sutton, Doina Precup, Satinder Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Artificial Intelligence. ,vol. 112, pp. 181- 211 ,(1999) , 10.1016/S0004-3702(99)00052-1
Nick R. Jennings, Commitments and conventions: The foundation of coordination in multi-agent systems Knowledge Engineering Review. ,vol. 8, pp. 223- 250 ,(1993) , 10.1017/S0269888900000205