Optimization as a Model for Few-Shot Learning

作者: Hugo Larochelle , Sachin Ravi

DOI:

关键词: ParametrizationMachine learningConvergence (routing)Artificial neural networkArtificial intelligenceComputer scienceInitializationShot (filmmaking)Set (abstract data type)Data domainClass (computer programming)

摘要: … models requires many iterative updates across many labeled … have to start from a random initialization of its parameters, … during 3meta−test, but then erase the running statistics when we …

参考文章(20)
Matthew D. Zeiler, ADADELTA: An Adaptive Learning Rate Method arXiv: Learning. ,(2012)
Ilya Sutskever, Rafal Jozefowicz, Wojciech Zaremba, Wojciech Zaremba, An Empirical Exploration of Recurrent Network Architectures international conference on machine learning. pp. 2342- 2350 ,(2015)
Jürgen Schmidhuber, Jieyu Zhao, Marco Wiering, Shifting Inductive Bias with Success-Story Algorithm, AdaptiveLevin Search, and Incremental Self-Improvement Machine Learning. ,vol. 28, pp. 105- 130 ,(1997) , 10.1023/A:1007383707642
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
David Duvenaud, Ryan P. Adams, Dougal Maclaurin, Gradient-based Hyperparameter Optimization through Reversible Learning arXiv: Machine Learning. ,(2015)
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735
J. Schmidhuber, A neural network that embeds its own meta-levels IEEE International Conference on Neural Networks. pp. 407- 412 ,(1993) , 10.1109/ICNN.1993.298591
Sebastian Thrun, Lifelong Learning Algorithms Learning to Learn. pp. 181- 209 ,(1998) , 10.1007/978-1-4615-5529-2_8
Sepp Hochreiter, A. Steven Younger, Peter R. Conwell, Learning to Learn Using Gradient Descent international conference on artificial neural networks. pp. 87- 94 ,(2001) , 10.1007/3-540-44668-0_13