作者: Peter Abraham Raffensperger
DOI:
关键词:
摘要: Algorithmically designed reward functions can influence groups of learning agents toward measurable desired sequential joint behaviours. Influencing desirable behaviours is non-trivial due to the difficulties assigning credit for global success deserving and inducing coordination. Quantifying lets us identify by ranking some as more than others. We propose a real-valued metric turn-taking, demonstrating how measure one behaviour. describe presence turn-taking in simulation results we calculate quantity turntaking that could be observed between independent random agents. demonstrate our reinterpreting previous work on emergent communication analysing recorded human conversation. Given metric, explore space those result ‘medium access games’ model machine present an extensive range pairs Q-learning use Nash equilibria medium games develop predictors determining which turn-taking. Having demonstrated predictive power games, focus synthesis stochastic arbitrary equilibria. Our method constructs function such particular behaviour unique equilibrium game, provided exists. This builds techniques designing rewards Markov decision processes normal form games. explain design methods detail formally prove they are correct.