摘要: This paper investigates the problem of policy learning in multiagent environments using stochastic game framework, which we briefly overview. We introduce two properties as desirable for a agent when presence other agents, namely rationality and convergence. examine existing reinforcement algorithms according to these notice that they fail simultaneously meet both criteria. then contribute new algorithm, WoLF hillclimbing, is based on simple principle: “learn quickly while losing, slowly winning.” The algorithm proven be rational present empirical results number games showing converges.