作者: David Blackwell
关键词: Combinatorics 、 Limiting case (mathematics) 、 Unit (ring theory) 、 Function (mathematics) 、 Action (physics) 、 State (functional analysis) 、 Special case 、 Mathematics 、 Current (mathematics) 、 Finite set 、 Discrete mathematics
摘要: We consider a system with finite number $S$ of states $s$, labeled by the integers $1, 2, \cdots, S$. Periodically, say once day, we observe current state system, and then choose an action $a$ from set $A$ possible actions. As joint result $s$ chosen $a$, two things happen: (1) receive immediate income $i(s, a)$ (2) moves to new $s'$ probability particular given function $q = q(s' \mid s, a)$. Finally there is specified discount factor $\beta, 0 \leqq \beta < 1$, so that value unit $n$ days in future $\beta^n$. Our problem policy which maximizes our total expected income. This problem, interesting special case general dynamic programming has been solved Howard his excellent book [3]. The $\beta also studied Howard, substantially more difficult. shall obtain this results slightly beyond those though still not complete. method, treats 1$ as limiting seems rather simpler than Howard's.