Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning

作者: Gabriel Kreiman , David Cox , William Lotter

DOI:

关键词:

摘要: While great strides have been made in using deep learning algorithms to solve supervised tasks, the problem of unsupervised - leveraging unlabeled examples learn about structure a domain remains difficult unsolved challenge. Here, we explore prediction future frames video sequence as an rule for visual world. We describe predictive neural network ("PredNet") architecture that is inspired by concept "predictive coding" from neuroscience literature. These networks predict sequence, with each layer making local predictions and only forwarding deviations those subsequent layers. show these are able robustly movement synthetic (rendered) objects, doing so, internal representations useful decoding latent object parameters (e.g. pose) support recognition fewer training views. also can scale complex natural image streams (car-mounted camera videos), capturing key aspects both egocentric objects scene, representation learned this setting estimating steering angle. Altogether, results suggest represents powerful framework learning, allowing implicit scene structure.

参考文章(60)
Andrew Saxe, Maneesh Bhand, Andrew Y. Ng, Zhenghao Chen, Bipin Suresh, Pang W. Koh, On Random Weights and Unsupervised Feature Learning international conference on machine learning. pp. 1089- 1096 ,(2011)
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Dean Wyatte, Randall C. O'Reilly, John Rohrlich, Learning Through Time in the Thalamocortical Loops arXiv: Neurons and Cognition. ,(2014)
Ronan Collobert, Arthur Szlam, Marc'Aurelio Ranzato, Joan Bruna, Michaël Mathieu, Sumit Chopra, Video (language) modeling: a baseline for generative models of natural videos. arXiv: Learning. ,(2014)
D. George, J. Hawkins, A hierarchical Bayesian model of invariant pattern recognition in the visual cortex international joint conference on neural network. ,vol. 3, pp. 1812- 1817 ,(2005) , 10.1109/IJCNN.2005.1556155
Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition computer vision and pattern recognition. ,(2014)
Pushmeet Kohli, Joshua B. Tenenbaum, Tejas D. Kulkarni, William F. Whitney, Deep convolutional inverse graphics network neural information processing systems. ,vol. 28, pp. 2539- 2547 ,(2015)
Alex Graves, Generating Sequences With Recurrent Neural Networks arXiv: Neural and Evolutionary Computing. ,(2013)
Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun, Unsupervised Learning of Spatiotemporally Coherent Metrics 2015 IEEE International Conference on Computer Vision (ICCV). pp. 4086- 4093 ,(2015) , 10.1109/ICCV.2015.465