Learning non-cooperative dialogue policies to beat opponent models:“The good, the bad and the ugly”

作者: Ioannis Efstathiou , Oliver Lemon

DOI:

关键词:

摘要: Non-cooperative dialogue capabilities have been identified as important in a variety of application areas, including education, military operations, video games, police investigation and healthcare. In prior work, it was shown how agents can learn to use explicit manipulation moves in dialogue (eg “I really need wheat”) to manipulate adversaries in a simple trading game. The adversaries had a very simple opponent model. In this paper we implement a more complex opponent model for adversaries, we now model all trading dialogue moves as affecting the adversary’s opponent model, and we work in a more complex game setting: Catan. Here we show that (even in such a nonstationary environment) agents can learn to be legitimately persuasive (“the good”) or deceitful (“the bad”). We achieve up to 11% higher success rates than a reasonable hand-crafted trading dialogue strategy (“the ugly”). We also present a novel way of encoding the state space for Reinforcement Learning of trading dialogues that reduces the state-space size to 0.005% of the original, and so reduces training times dramatically. 1 Previous workRecently it has been demonstrated that when given the ability to perform both cooperative and noncooperative/manipulative dialogue moves, a dialogue agent can learn to bluff and to lie during trading dialogues so as to win games more often, under various conditions such as risking penalties for being caught in deception–against a variety of adversaries (Efstathiou and Lemon, 2014b; Efstathiou and Lemon, 2014a). Some of the adversaries (which are computer programs, not humans)

参考文章(0)