作者: George Michailidis , Ambuj Tewari , Mohamad Kazem Shirani Faradonbeh
DOI:
关键词:
摘要: In decision making problems for continuous state and action spaces, linear dynamical models are widely employed. Specifically, policies stochastic systems subject to quadratic cost functions capture a large number of applications in reinforcement learning. Selected randomized have been studied the literature recently that address trade-off between identification control. However, little is known about based on bootstrapping observed states actions. this work, we show bootstrap-based achieve square root scaling regret with respect time. We also obtain results accuracy learning model's dynamics. Corroborative numerical analysis illustrates technical provided.