DOI:
关键词: Plan (drawing) 、 Reinforcement learning 、 Computer science 、 Trajectory 、 Artificial intelligence 、 Nonparametric model 、 Error-driven learning 、 Machine learning 、 Learning classifier system
摘要: This paper describes some of the interactions model learning algorithms and planning we have found in exploring model-based reinforcement learning. The focuses on how local trajectory optimizers can be used effectively with learned non-parametric models. We find that planners are fully consistent often difficulty finding reasonable plans early stages Trajectory balance obeying minimizing cost (or maximizing reward) do better, even if plan is not model.