作者: Xaq Pitkow , Paul Schrater , Minhae Kwon , Saurabh Daptardar
DOI:
关键词: Gradient descent 、 Nonlinear system 、 Markov decision process 、 Observable 、 Computer science 、 Internal model 、 Bayesian probability 、 Mathematical optimization 、 Reinforcement learning 、 Latent variable
摘要: A fundamental question in neuroscience is how the brain creates an internal model of world to guide actions using sequences ambiguous sensory information. This naturally formulated as a reinforcement learning problem under partial observations, where agent must estimate relevant latent variables from its evidence, anticipate possible future states, and choose that optimize total expected reward. can be solved by control theory, which allows us find optimal for given system dynamics objective function. However, animals often appear behave suboptimally. Why? We hypothesize have their own flawed world, with highest subjective reward according model. describe this behavior rational but not optimal. The Inverse Rational Control (IRC) aims identify would best explain agent's actions. Our contribution here generalizes past work on discrete partially observable Markov decision processes. Here we accommodate continuous nonlinear actions, impute observations corrupted unknown noise private animal. first build Bayesian learns policy generalized over entire space rewards deep learning. Crucially, compute likelihood models experimentally action trajectories acquired suboptimal agent. then parameters maximize gradient ascent.