Emergent Prosociality in Multi-Agent Games Through Gifting.

作者: Ramtin Pedarsani , Dorsa Sadigh , Erdem Biyik , Daniel A. Lazar , Woodrow Z. Wang

DOI:

关键词:

摘要: Coordination is often critical to forming prosocial behaviors -- that increase the overall sum of rewards received by all agents in a multi-agent game. However, state art reinforcement learning algorithms suffer from converging socially less desirable equilibria when multiple exist. Previous works address this challenge with explicit reward shaping, which requires strong assumption can be forced prosocial. We propose using restrictive peer-rewarding mechanism, gifting, guides toward more while allowing remain selfish and decentralized. Gifting allows each agent give some their other agents. employ theoretical framework captures benefit gifting equilibrium characterizing equilibria's basins attraction dynamical system. With we demonstrate increased convergence high risk, general-sum coordination games both via numerical analysis experiments.

参考文章(23)
Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)
Laetitia Matignon, Guillaume J. Laurent, Nadine Le Fort-Piat, Review: independent reinforcement learners in cooperative markov games: A survey regarding coordination problems Knowledge Engineering Review. ,vol. 27, pp. 1- 31 ,(2012) , 10.1017/S0269888912000057
Ariel D. Procaccia, Yair Zick, Maria-Florina Balcan, Learning cooperative games international conference on artificial intelligence. ,vol. 2015, pp. 475- 481 ,(2015)
Liviu Panait, Karl Tuyls, Sean Luke, Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective Journal of Machine Learning Research. ,vol. 9, pp. 423- 457 ,(2008)
Yishay Mansour, Satinder P. Singh, Richard S Sutton, David A. McAllester, Policy Gradient Methods for Reinforcement Learning with Function Approximation neural information processing systems. ,vol. 12, pp. 1057- 1063 ,(1999)
Dorsa Sadigh, Shankar Sastry, Sanjit A. Seshia, Anca D. Dragan, Planning for Autonomous Cars that Leverage Effects on Human Actions robotics science and systems. ,vol. 12, ,(2016) , 10.15607/RSS.2016.XII.029
Hirokazu Shirado, Nicholas A. Christakis, Locally noisy autonomous agents improve global human coordination in network experiments Nature. ,vol. 545, pp. 370- 374 ,(2017) , 10.1038/NATURE22332
Dorsa Sadigh, Nick Landolfi, Shankar S. Sastry, Sanjit A. Seshia, Anca D. Dragan, Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state Autonomous Robots. ,vol. 42, pp. 1405- 1426 ,(2018) , 10.1007/S10514-018-9746-1
Iain Dunning, Joel Z. Leibo, Karl Tuyls, Thore Graepel, Raphael Koster, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Kevin R. McKee, Tina Zhu, Heather Roff, Edward Hughes, Matthew G. Phillips, Inequity aversion improves cooperation in intertemporal social dilemmas. arXiv: Neural and Evolutionary Computing. ,(2018)