作者: David Foster , Peter Dayan
关键词:
摘要: Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing into its computationally elemental fragments. We suggest how to find fragmentations using unsupervised, mixture model, learning methods on data derived from value functions for multiple tasks, and show that these are accord with observable structure environments. Further, we present evidence such fragments can be of use a practical reinforcement context, by facilitating online, actor-critic goals MDPs.