Efficient approximate policy iteration methods for sequential decision making in reinforcement learning

作者： Michail G. Lagoudakis , Ronald Parr

DOI:

关键词:

摘要: Reinforcement learning is a promising paradigm in which an agent learns how to make good decisions by interacting with (unknown) environment. This framework can be extended along two dimensions: the number of decision makers (single- or multi-agent) and nature interaction (collaborative competitive). characterization leads four making situations that are considered this thesis modeled as Markov processes, team zero-sum games, games. Existing reinforcement algorithms have not been applied widely on real-world problems, mainly because required resources grow fast function size problem. Exact, but impractical, solutions commonly abandoned favor approximate, practical, solutions. Unfortunately, research efficient stable approximate methods has focused prediction problem, where tries learn outcome fixed policy. contributes based general policy iteration for control whereby Least-Squares Policy Iteration (LSPI) algorithm policies least-squares fixed-point approximation value function. LSPI makes use sample experience and, therefore, most appropriate domains training data expensive simulator process available. Rollout Classification (RCPI) other hand rollouts (Monte-Carlo simulation estimates) train classifier represents For reason, RCPI comes at no cost Both exhibit nice theoretical properties, they bear strong connections areas, such feature selection classification learning, respectively. The proposed demonstrated variety tasks: chain walk, inverted pendulum balancing, bicycle balancing riding, game Tetris, multiagent system administration, distributed power grid control, server-router flow two-player soccer game, control. These results demonstrate clearly efficiency applicability new large problems.

acm.org UNKNOWN 下载加速

参考文章(59)

Leslie Pack Kaelbling, Michael L. Littman, Thomas L. Dean, On the complexity of solving Markov decision problems uncertainty in artificial intelligence. pp. 394- 402 ,(1995)

John Nash, NON-COOPERATIVE GAMES Annals of Mathematics. ,vol. 54, pp. 286- ,(1951) , 10.1017/CBO9780511528231.007

Arega W. Yohannes, Dynamic Programming ,(1957)

L. S. Shapley, Stochastic Games Proceedings of the National Academy of Sciences of the United States of America. ,vol. 39, pp. 1095- 1100 ,(1953) , 10.1073/PNAS.39.10.1095

C. J. C. H. Watkins, Learning from delayed rewards Ph. D thesis, Cambridge University Psychology Department. ,(1989)

Dimitri P. Bertsekas, David A. Castanon, Rollout Algorithms for Stochastic Scheduling Problems Journal of Heuristics. ,vol. 5, pp. 89- 108 ,(1999) , 10.1023/A:1009634810396

Martin A. Riedmiller, Weng-Keen Wong, Jeff G. Schneider, Andrew W. Moore, Distributed Value Functions international conference on machine learning. pp. 371- 378 ,(1999)

Daphne Koller, Ronald Parr, Policy Iteration for Factored MDPs uncertainty in artificial intelligence. pp. 326- 334 ,(2000)

Geoffrey J. Gordon, Tom Mitchell, Approximate solutions to markov decision processes Carnegie Mellon University. ,(1999)

10.

A. NEDIĆ, D. P. BERTSEKAS, Least Squares Policy Evaluation Algorithms with Linear Function Approximation Discrete Event Dynamic Systems. ,vol. 13, pp. 79- 110 ,(2003) , 10.1023/A:1022192903948

Efficient approximate policy iteration methods for sequential decision making in reinforcement learning

来源期刊

我的账户

Efficient approximate policy iteration methods for sequential decision making in reinforcement learning

来源期刊

相似文章 4

Self-adaptive multi-objective optimization method design based on agent reinforcement learning for elevator group control systems

Rollout sampling approximate policy iteration

Average-reward model-free reinforcement learning: a systematic review and literature mapping.

Rollout Sampling Approximate Policy Iteration

我的账户