作者: Oliver Lemon , Ioannis Efstathiou
DOI: 10.3233/978-1-61499-419-0-999
关键词: Bluff 、 Order (exchange) 、 Perfect information 、 Reinforcement learning 、 Information hiding 、 Adversary 、 Deception 、 Variety (cybernetics) 、 Artificial intelligence 、 Computer science
摘要: Non-cooperative dialogue behaviour for artificial agents (e.g. deception and information hiding) has been identified as important in a variety of application areas, including education healthcare, but it not yet addressed using modern statistical approaches to agents. Deception also argued be requirement high-order intentionality AI. We develop evaluate agent Reinforcement Learning which learns perform non-cooperative moves order complete its own objectives stochastic trading game with imperfect information. show that, when given the ability both cooperative moves, such an can learn bluff lie so win more games. For example, we that 10.5% games than strong rule-based adversary, compared optimised cannot moves. This work is first how use way meet their goals.