作者: Stuart Russell , Ronald Parr
DOI:
关键词: Markov decision process 、 Q-learning 、 State (functional analysis) 、 Mathematical optimization 、 Observable 、 Partially observable Markov decision process 、 Test case 、 Reinforcement learning 、 Mathematics 、 Markov model
摘要: The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligence If the state world known at all times, can be modeled as a Markov Decision Process (MDP) MDPs have been studied extensively and many methods are for determining courses action or policies. more realistic case where information only partially observable Partially Observable Processes (POMDPs) received much less attention. best exact algorithms these problems very inefficient both space time. We introduce Smooth Value Approximation (SPOVA), new approximation method that quickly yield good approximations which improve over This combined with reinforcement learning meth ods combination was effective our test cases.