Combining manual feedback with subsequent MDP reward signals for reinforcement learning

关键词:

摘要: As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able teach desired behaviors. Recently, tamer framework was introduced for designing can interactively shaped by trainers who give only positive and negative feedback signals. Past work on showed shaping greatly reduce sample complexity required learn a good policy, enable lay users behaviors they desire, allow within Markov Decision Process (MDP) in absence of coded reward function. However, does not this training combined with autonomous based such This paper leverages fast exhibited hasten reinforcement (RL) algorithm's climb up curve, effectively demonstrating MDP used conjunction one another an agent. We tested eight plausible tamer+rl methods combining previously learned function, H, algorithm. identifies which these are most effective analyzes their strengths weaknesses. Results algorithms indicate better final performance cumulative than either agent or RL alone.

uni-trier.de 本地加速

acm.org 本地加速

utexas.edu 本地加速

tu-darmstadt.de PDF 下载加速

ifaamas.org PDF 下载加速

utexas.edu PDF 下载加速

aamas-conference.org LINK 下载加速

bradknox.net PDF 下载加速

acm.org LINK 下载加速

academia.edu PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(3)

Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)

Brenna D. Argall, Sonia Chernova, Manuela Veloso, Brett Browning, A survey of robot learning from demonstration Robotics and Autonomous Systems. ,vol. 57, pp. 469- 483 ,(2009) , 10.1016/J.ROBOT.2008.10.024

A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)

Combining manual feedback with subsequent MDP reward signals for reinforcement learning

来源期刊

我的账户

Combining manual feedback with subsequent MDP reward signals for reinforcement learning

来源期刊

相似文章 6

Learning via human feedback in continuous state and action spaces

Simultaneous abstract and concrete reinforcement learning

Training a robot with evaluative feedback and unlabeled guidance signals

Interaction Algorithm Effect on Human Experience with Reinforcement Learning

Robot Learning via Human Adversarial Games

Robot Learning via Human Adversarial Games

我的账户