Reinforcement learning from human reward: Discounting in episodic tasks

作者: W. Bradley Knox , Peter Stone

DOI: 10.1109/ROMAN.2012.6343862

关键词: DiscountingReinforcement learningCredenceTrainerBehavioural sciencesMachine learningTask (project management)Cognitive psychologyArtificial intelligenceComputer scienceSpace (commercial competition)

摘要: … for learning from human reward has hitherto not been explored systematically. Using model-based reinforcement learning … future rewards should be discounted to create behavior that …

参考文章(3)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)
Brenna D. Argall, Sonia Chernova, Manuela Veloso, Brett Browning, A survey of robot learning from demonstration Robotics and Autonomous Systems. ,vol. 57, pp. 469- 483 ,(2009) , 10.1016/J.ROBOT.2008.10.024
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)