Learning and Sequential Decision Making

作者: A. G. Barto , R. S. Sutton , C. J.C.H. Watkins

DOI:

关键词:

摘要: IN THIS REPORT WE SHOW HOW THE CLASS OF ADAPTIVE PREDICTION METHODS THAT SUTTON CALLED "TEMPORAL DIFFERENCE", OR TD, ARE RELATED TO THE- ORY SEQUENTIAL DECISION MAKING. TD HAVE BEEN USED AS "ADAPTIVE CRITICS" CONNECTIONIST LEARNING SYSTEMS,AND PROPOSED MODELS ANIMAL CLASSICAL CONDITIONING EXPERIMENTS. HERE RELATE TASKS FORMULATED TERMS A STOCHASTIC DYNAMICAL SYSTEM WHOSE BEHAVIOR UNFOLDS OVER TIME UNDER INFLUENCE MAKER''S ACTIONS. STRATEGIES SOUGHT FOR SELECTING ACTIONS SO MAXI- MIZE MEASURE LONG-TERM PAYOFF GAIN. MATHEMATICALLY, SUCH CAN BE MARKOVIAN PROBLEMS, AND NUMEROUS SOLVE PROBLEMS. METHOD UNDERSTOOD NOVEL SYNTHESIS CONCEPTS FROM THEORY DYNAMIC PROGRAMMING, WHICH COMPRISES STANDARD SOLVING WHEN MODEL IS AVAILABLE, PARAMETER ESTIMATION, PROVIDES APPROPRIATE CONTEXT STUDYING RULES FORM EQUATIONS UPDATING ASSOCIA- TIVE STRENGTHS BEHAVIORAL MODELS, CONNECTION WEIGHTS NETWORKS. BECAUSE ORIENTED PRIMARILY TOWARD NON-ENGINEER INTERESTED LEARNING, IT PRESENTS TUTORIALS ON SEQUEN- TIAL TASKS, ESTIMATI

参考文章(0)