作者: Krishnamurthy Dvijotham , Yuval Tassa , Cosmin Paduraru , Todd Hester , Gal Dalal
DOI:
关键词:
摘要: … An appropriate Lyapunov function was identified for policy at… 5 depicts the drawbacks of reward shaping for ensuring safety. … the best reward shaping choice, and to no reward shaping at …