作者: Sridhar Mahadevan , Jonathan Connell
DOI:
关键词:
摘要: This paper describes a general approach for automatically programming behavior-based robot. New behaviors are learned by trial and error using performance feedback function as reinforcement. Two algorithms behavior learning described that combine techniques propagating reinforcement values temporally across actions spatially states. A robot called OBELIX (see Figure 1) is learns several component in an example task involving pushing boxes. An experimental study the suggests two conclusions. One, able to learn individual behaviors, sometimes outperforming hand-coded program. Two, architecture better than monolithic box task.