Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

作者： Xaq Pitkow , Paul Schrater , Minhae Kwon , Saurabh Daptardar

DOI:

关键词: Gradient descent 、 Nonlinear system 、 Markov decision process 、 Observable 、 Computer science 、 Internal model 、 Bayesian probability 、 Mathematical optimization 、 Reinforcement learning 、 Latent variable

摘要: A fundamental question in neuroscience is how the brain creates an internal model of world to guide actions using sequences ambiguous sensory information. This naturally formulated as a reinforcement learning problem under partial observations, where agent must estimate relevant latent variables from its evidence, anticipate possible future states, and choose that optimize total expected reward. can be solved by control theory, which allows us find optimal for given system dynamics objective function. However, animals often appear behave suboptimally. Why? We hypothesize have their own flawed world, with highest subjective reward according model. describe this behavior rational but not optimal. The Inverse Rational Control (IRC) aims identify would best explain agent's actions. Our contribution here generalizes past work on discrete partially observable Markov decision processes. Here we accommodate continuous nonlinear actions, impute observations corrupted unknown noise private animal. first build Bayesian learns policy generalized over entire space rewards deep learning. Crucially, compute likelihood models experimentally action trajectories acquired suboptimal agent. then parameters maximize gradient ascent.

arxiv.org 本地加速

arxiv.org PDF 下载加速

参考文章(64)

Terrence J. Sejnowski, Scott E. Fahlman, Geoffrey E. Hinton, Massively parallel architectures for AI: netl, thistle, and boltzmann machines national conference on artificial intelligence. pp. 109- 113 ,(1983)

Bayesian brain : probabilistic approaches to neural coding MIT Press. ,(2006) , 10.7551/MITPRESS/9780262042383.001.0001

Krishnamurthy Dvijotham, Emanuel Todorov, Inverse Optimal Control with Linearly-Solvable MDPs international conference on machine learning. pp. 335- 342 ,(2010)

Eyal Amir, Deepak Ramachandran, Bayesian inverse reinforcement learning international joint conference on artificial intelligence. ,vol. 51, pp. 2586- 2591 ,(2007)

Jaedeug Choi, Kee-Eung Kim, Inverse Reinforcement Learning in Partially Observable Environments Journal of Machine Learning Research. ,vol. 12, pp. 691- 730 ,(2011)

Christopher M. Bishop, Pattern Recognition and Machine Learning ,(2006)

Yanping Huang, Rajesh P. N. Rao, Reward Optimization in the Primate Brain: A Probabilistic Model of Decision Making under Uncertainty PLOS ONE. ,vol. 8, ,(2013) , 10.1371/JOURNAL.PONE.0053344

Mrinal Kalakrishnan, Peter Pastor, Ludovic Righetti, Stefan Schaal, Learning objective functions for manipulation international conference on robotics and automation. pp. 1331- 1336 ,(2013) , 10.1109/ICRA.2013.6630743

Pieter Abbeel, Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning Twenty-first international conference on Machine learning - ICML '04. pp. 1- 8 ,(2004) , 10.1145/1015330.1015430

10.

K.J Åström, Optimal control of Markov processes with incomplete state information Journal of Mathematical Analysis and Applications. ,vol. 10, pp. 174- 205 ,(1965) , 10.1016/0022-247X(65)90154-X

Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

来源期刊

我的账户

Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

来源期刊

相似文章 0

我的账户