作者: Michail G. Lagoudakis , Ronald Parr
DOI:
关键词:
摘要: Reinforcement learning is a promising paradigm in which an agent learns how to make good decisions by interacting with (unknown) environment. This framework can be extended along two dimensions: the number of decision makers (single- or multi-agent) and nature interaction (collaborative competitive). characterization leads four making situations that are considered this thesis modeled as Markov processes, team zero-sum games, games. Existing reinforcement algorithms have not been applied widely on real-world problems, mainly because required resources grow fast function size problem. Exact, but impractical, solutions commonly abandoned favor approximate, practical, solutions. Unfortunately, research efficient stable approximate methods has focused prediction problem, where tries learn outcome fixed policy. contributes based general policy iteration for control whereby Least-Squares Policy Iteration (LSPI) algorithm policies least-squares fixed-point approximation value function. LSPI makes use sample experience and, therefore, most appropriate domains training data expensive simulator process available. Rollout Classification (RCPI) other hand rollouts (Monte-Carlo simulation estimates) train classifier represents For reason, RCPI comes at no cost Both exhibit nice theoretical properties, they bear strong connections areas, such feature selection classification learning, respectively. The proposed demonstrated variety tasks: chain walk, inverted pendulum balancing, bicycle balancing riding, game Tetris, multiagent system administration, distributed power grid control, server-router flow two-player soccer game, control. These results demonstrate clearly efficiency applicability new large problems.