Discrete Dynamic Programming

作者: David Blackwell

DOI: 10.1214/AOMS/1177704593

关键词: CombinatoricsLimiting case (mathematics)Unit (ring theory)Function (mathematics)Action (physics)State (functional analysis)Special caseMathematicsCurrent (mathematics)Finite setDiscrete mathematics

摘要: We consider a system with finite number $S$ of states $s$, labeled by the integers $1, 2, \cdots, S$. Periodically, say once day, we observe current state system, and then choose an action $a$ from set $A$ possible actions. As joint result $s$ chosen $a$, two things happen: (1) receive immediate income $i(s, a)$ (2) moves to new $s'$ probability particular given function $q = q(s' \mid s, a)$. Finally there is specified discount factor $\beta, 0 \leqq \beta < 1$, so that value unit $n$ days in future $\beta^n$. Our problem policy which maximizes our total expected income. This problem, interesting special case general dynamic programming has been solved Howard his excellent book [3]. The $\beta also studied Howard, substantially more difficult. shall obtain this results slightly beyond those though still not complete. method, treats 1$ as limiting seems rather simpler than Howard's.

参考文章(0)