Approximating optimal policies for partially observable stochastic domains

作者: Stuart Russell , Ronald Parr

DOI:

关键词: Markov decision processQ-learningState (functional analysis)Mathematical optimizationObservablePartially observable Markov decision processTest caseReinforcement learningMathematicsMarkov model

摘要: The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligence If the state world known at all times, can be modeled as a Markov Decision Process (MDP) MDPs have been studied extensively and many methods are for determining courses action or policies. more realistic case where information only partially observable Partially Observable Processes (POMDPs) received much less attention. best exact algorithms these problems very inefficient both space time. We introduce Smooth Value Approximation (SPOVA), new approximation method that quickly yield good approximations which improve over This combined with reinforcement learning meth ods combination was effective our test cases.

参考文章(14)
Michael L. Littman, Anthony R. Cassandra, Leslie Pack Kaelbling, Learning policies for partially observable environments: Scaling up Machine Learning Proceedings 1995. pp. 362- 370 ,(1995) , 10.1016/B978-1-55860-377-6.50052-9
R. Andrew McCallum, Overcoming incomplete perception with util distinction memory international conference on machine learning. pp. 190- 196 ,(1993) , 10.1016/B978-1-55860-307-3.50031-9
E. J. Sondik, The Optimal Control of Partially Observable Markov Decision Processes. PhD the sis, Stanford University. ,(1971)
Long-Ji Lin, Tom M Mitchell, None, Memory Approaches to Reinforcement Learning in Non-Markovian Domains Carnegie Mellon University. ,(1992)
Lonnie Chrisman, Reinforcement learning with perceptual aliasing: the perceptual distinctions approach national conference on artificial intelligence. pp. 183- 188 ,(1992)
William S. Lovejoy, A survey of algorithmic methods for partially observed Markov decision processes Annals of Operations Research. ,vol. 28, pp. 47- 66 ,(1991) , 10.1007/BF02055574
Anthony R. Cassandra, Leslie Pack Kaelbling, Michael L. Littman, Acting Optimally in Partially Observable Stochastic Domains national conference on artificial intelligence. pp. 1023- 1028 ,(1994)
Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences Machine Learning. ,vol. 3, pp. 9- 44 ,(1988) , 10.1023/A:1022633531479
Tommi Jaakkola, Satinder P. Singh, Michael Jordan, None, Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems neural information processing systems. ,vol. 7, pp. 345- 352 ,(1994)