作者: Alex Rogers , Michael Wooldridge , Wendelin Boehmer , Dongge Han
DOI: 10.1007/978-3-030-29911-8_7
关键词: Perspective (graphical) 、 Bellman equation 、 Flexibility (engineering) 、 Key (cryptography) 、 Order (business) 、 Computer science 、 Risk analysis (engineering) 、 Reinforcement learning 、 Duration (project management)
摘要: In a multi-agent system, an agent's optimal policy will typically depend on the policies of other agents. Predicting behaviours others, and responding promptly to changes in such behaviours, is therefore key issue systems research. One obvious possibility for each agent broadcast their current intention, example, currently executed option hierarchical RL framework. However, this approach results inflexible agents when options have extended duration. While adjusting at step improves flexibility from single-agent perspective, frequent can induce inconsistency between actual behaviour its broadcasted intention. order balance predictability, we propose dynamic termination Bellman equation that allows flexibly terminate options.