摘要: The methods of temporal differences (Samuel, 1959s Sutton, 1984, 1988) allow an agent to learn accurate predictions stationary stochastic future outcomes. learning is effectively approximation based on samples extracted from the process generating agent's future. Sutton (1988) proved that for a special case differences, expected values converge their correct values, as larger are taken, and Dayan (1992) extended his proof general case. This article proves stronger result than slightly modified form difference with probability one, shows how quantify rate convergence.