作者: B. J. Oommen , Eldon Hansen
DOI: 10.1109/TSMC.1984.6313256
关键词:
摘要: The automata considered have a variable structure and hence are completely described by action probability updating functions. probabilities can take only finite number of prespecified values. These values linearly increase the interval [0, 1] is divided into equal length subintervals. updated if environment responds with reward they called discretized linear reward-inaction automata. asymptotic optimality this family proved for all environments.