Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

作者: A. Al-Tamimi , F.L. Lewis , M. Abu-Khalaf

DOI: 10.1109/TSMCB.2008.926614

关键词:

摘要: Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control. It is assumed that, at each iteration, the value and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to …

参考文章(35)
W Thomas Miller, Richard S Sutton, Paul J Werbos, A menu of designs for reinforcement learning over time Neural networks for control. pp. 67- 95 ,(1990)
Wei Lin, C.I. Byrnes, H/sub /spl infin//-control of discrete-time nonlinear systems IEEE Transactions on Automatic Control. ,vol. 41, pp. 494- 510 ,(1996) , 10.1109/9.489271
SHG ten Hagen, Ben JA Kröse, None, Linear Quadratic Regulation using reinforcement learning Proc. of the 8th Belgian-dutch Conf. on machine learning BENELEARN-98. pp. 39- 46 ,(1998)
F W Lewis, S. Jagannathan, A Yesildirak, Neural Network Control of Robot Manipulators and Nonlinear Systems CRC Press. ,(1998) , 10.1201/9781003062714
ModelBased Adaptive Critic Designs Handbook of Learning and Approximate Dynamic Programming. pp. 65- 95 ,(2004) , 10.1109/9780470544785.CH3
John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)
L. Rodman, Peter Lancaster, Algebraic Riccati equations ,(1995)
S.J. Bradtke, B.E. Ydstie, A.G. Barto, Adaptive linear quadratic control using policy iteration advances in computing and communications. ,vol. 3, pp. 3475- 3479 ,(1994) , 10.1109/ACC.1994.735224