作者: Dongge Han , Wendelin Böhmer , Michael Wooldridge , Alex Rogers
DOI: 10.1007/978-3-030-29911-8_7
关键词: Reinforcement learning 、 Key (cryptography) 、 Set (psychology) 、 Flexibility (engineering) 、 Bellman equation 、 Predictability 、 Duration (project management) 、 Computer science 、 Operations research 、 Order (business)
摘要: In a multi-agent system, an agent's optimal policy will typically depend on the policies chosen by others. Therefore, key issue in systems research is that of predicting behaviours others, and responding promptly to changes such behaviours. One obvious possibility for each agent broadcast their current intention, example, currently executed option hierarchical reinforcement learning framework. However, this approach results inflexibility agents if options have extended duration are dynamic. While adjusting at step improves flexibility from single-agent perspective, frequent can induce inconsistency between actual behaviour its intention. order balance predictability, we propose dynamic termination Bellman equation allows flexibly terminate options. We evaluate our model empirically set pursuit taxi tasks, show learn adapt across scenarios require different