Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

作者: Xaq Pitkow , Paul Schrater , Minhae Kwon , Saurabh Daptardar

DOI:

关键词: Gradient descentNonlinear systemMarkov decision processObservableComputer scienceInternal modelBayesian probabilityMathematical optimizationReinforcement learningLatent variable

摘要: A fundamental question in neuroscience is how the brain creates an internal model of world to guide actions using sequences ambiguous sensory information. This naturally formulated as a reinforcement learning problem under partial observations, where agent must estimate relevant latent variables from its evidence, anticipate possible future states, and choose that optimize total expected reward. can be solved by control theory, which allows us find optimal for given system dynamics objective function. However, animals often appear behave suboptimally. Why? We hypothesize have their own flawed world, with highest subjective reward according model. describe this behavior rational but not optimal. The Inverse Rational Control (IRC) aims identify would best explain agent's actions. Our contribution here generalizes past work on discrete partially observable Markov decision processes. Here we accommodate continuous nonlinear actions, impute observations corrupted unknown noise private animal. first build Bayesian learns policy generalized over entire space rewards deep learning. Crucially, compute likelihood models experimentally action trajectories acquired suboptimal agent. then parameters maximize gradient ascent.

参考文章(64)
Terrence J. Sejnowski, Scott E. Fahlman, Geoffrey E. Hinton, Massively parallel architectures for AI: netl, thistle, and boltzmann machines national conference on artificial intelligence. pp. 109- 113 ,(1983)
Krishnamurthy Dvijotham, Emanuel Todorov, Inverse Optimal Control with Linearly-Solvable MDPs international conference on machine learning. pp. 335- 342 ,(2010)
Eyal Amir, Deepak Ramachandran, Bayesian inverse reinforcement learning international joint conference on artificial intelligence. ,vol. 51, pp. 2586- 2591 ,(2007)
Jaedeug Choi, Kee-Eung Kim, Inverse Reinforcement Learning in Partially Observable Environments Journal of Machine Learning Research. ,vol. 12, pp. 691- 730 ,(2011)
Christopher M. Bishop, Pattern Recognition and Machine Learning ,(2006)
Mrinal Kalakrishnan, Peter Pastor, Ludovic Righetti, Stefan Schaal, Learning objective functions for manipulation international conference on robotics and automation. pp. 1331- 1336 ,(2013) , 10.1109/ICRA.2013.6630743
Pieter Abbeel, Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning Twenty-first international conference on Machine learning - ICML '04. pp. 1- 8 ,(2004) , 10.1145/1015330.1015430
K.J Åström, Optimal control of Markov processes with incomplete state information Journal of Mathematical Analysis and Applications. ,vol. 10, pp. 174- 205 ,(1965) , 10.1016/0022-247X(65)90154-X