作者: Razvan Pascanu , Hujun Yin , Raia Hadsell , Francesco Visin , Andrei A. Rusu
DOI:
关键词:
摘要: Learning an efficient update rule from data that promotes rapid learning of new tasks the same distribution remains open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural network directly produces updates or learn better initialisations scaling factors for gradient-based rule. Both these approaches pose challenges. On one hand, producing forgoes useful inductive bias and can easily lead non-converging behaviour. other try control typically resort computing gradients through process obtain their meta-gradients, leading methods not scale beyond few-shot task adaptation. In work, we propose Warped Gradient Descent (WarpGrad), method intersects mitigate limitations. WarpGrad meta-learns efficiently parameterised preconditioning matrix facilitates gradient descent across distribution. Preconditioning arises interleaving non-linear layers, referred as warp-layers, between layers task-learner. Warp-layers are meta-learned without backpropagating training manner similar produce updates. is computationally efficient, easy implement, arbitrarily large meta-learning problems. We provide geometrical interpretation approach evaluate its effectiveness variety settings, including few-shot, standard supervised, continual reinforcement learning.