摘要: For reinforcement learning in environments which an agent has access to a reliable state signal, methods based on the Markov decision process (MDP) have had many successes. In problem domains, however, suffers from limited sensing capabilities that preclude it recovering Markovian signal its perceptions. Extending MDP framework, partially observable processes (POMDPs) allow for principled making under conditions of uncertain sensing. this chapter we present POMDP model by focusing differences with fully MDPs, and show how optimal policies POMDPs can be represented. Next, give review model-based techniques policy computation, followed overview available model-free POMDPs. We conclude highlighting recent trends learning.