作者: Gabriel Kreiman , David Cox , William Lotter
DOI:
关键词:
摘要: While great strides have been made in using deep learning algorithms to solve supervised tasks, the problem of unsupervised - leveraging unlabeled examples learn about structure a domain remains difficult unsolved challenge. Here, we explore prediction future frames video sequence as an rule for visual world. We describe predictive neural network ("PredNet") architecture that is inspired by concept "predictive coding" from neuroscience literature. These networks predict sequence, with each layer making local predictions and only forwarding deviations those subsequent layers. show these are able robustly movement synthetic (rendered) objects, doing so, internal representations useful decoding latent object parameters (e.g. pose) support recognition fewer training views. also can scale complex natural image streams (car-mounted camera videos), capturing key aspects both egocentric objects scene, representation learned this setting estimating steering angle. Altogether, results suggest represents powerful framework learning, allowing implicit scene structure.