作者: Adam S. Lowet , Qiao Zheng , Sara Matias , Jan Drugowitsch , Naoshige Uchida
DOI: 10.1016/J.TINS.2020.09.004
关键词:
摘要: Learning about rewards and punishments is critical for survival. Classical studies have demonstrated an impressive correspondence between the firing of dopamine neurons in mammalian midbrain reward prediction errors reinforcement learning algorithms, which express difference actual predicted mean reward. However, it may be advantageous to learn not only but also complete distribution potential rewards. Recent advances machine revealed a biologically plausible set algorithms reconstructing this from experience. Here, we review mathematical foundations these as well initial evidence their neurobiological implementation. We conclude by highlighting outstanding questions regarding circuit computation behavioral readout distributional codes.