作者: B. Johnoommen
DOI: 10.1109/TSMC.1986.4308951
关键词:
摘要: A learning automaton is a machine that interacts with random environment and simultaneously learns the optimal action offers to it. Learning automata variable structure are considered. Such completely defined by set of probability updating rules. Contrary all variable-structure stochastic (VSSA) discussed in literature, which update probabilities such way an can take any real value interval [0,1], space discretized so as permit assume one finite number distinct values [0,1]. The termed linear or nonlinear depending on whether subintervals [0,1] equal length. It proven 1) two-action reward-inaction absorbing ?-optimal environments; 2) inaction-penalty ergodic expedient 3) artificially created barriers 4) there exist environments. maximum advantage gained rendering finite-state has also been derived.