Greedy Layer-Wise Training of Deep Networks

作者： Yoshua Bengio , Hugo Larochelle , Pascal Lamblin , Dan Popovici

DOI:

关键词:

摘要: Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms computational elements required to represent some functions. Deep multi-layer neural networks have many levels non-linearities allowing them compactly highly non-linear and highly-varying However, until recently it was not clear how train such networks, since gradient-based optimization starting from random initialization appears often get stuck poor solutions. Hinton et al. introduced a greedy layer-wise unsupervised learning algorithm for Belief Networks (DBN), generative model with layers hidden causal variables. In the context above problem, we study this empirically explore variants better understand its success extend cases where inputs are continuous or structure input distribution is revealing enough about variable predicted supervised task. Our experiments also confirm hypothesis training strategy mostly helps optimization, by initializing weights region near good local minimum, giving rise internal distributed representations high-level abstractions input, bringing generalization.

参考文章(16)

Eric Allender, Circuit Complexity before the Dawn of the New Millennium foundations of software technology and theoretical computer science. pp. 1- 18 ,(1996) , 10.1007/3-540-62034-6_33

Régis Lengellé, Thierry Denœux, Training MLPs layer by layer using an objective function for internal representations Neural Networks. ,vol. 9, pp. 83- 97 ,(1996) , 10.1016/0893-6080(95)00096-8

G. Hinton, P Dayan, B. Frey, R. Neal, The "Wake-Sleep" Algorithm for Unsupervised Neural Networks Science. ,vol. 268, pp. 1158- 1161 ,(1995) , 10.1126/SCIENCE.7761831

Johan Torkel Håstad, Computational limitations of small-depth circuits ,(1987)

Geoffrey E Hinton, Ruslan R Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks Science. ,vol. 313, pp. 504- 507 ,(2006) , 10.1126/SCIENCE.1127647

Javier R. Movellan, Paul Mineiro, R. J. Williams, A Monte Carlo EM Approach for partially observable diffusion processes: theory and applications to neural networks Neural Computation. ,vol. 14, pp. 1507- 1544 ,(2002) , 10.1162/08997660260028593

Gerald Tesauro, Practical Issues in Temporal Difference Learning Machine Learning. ,vol. 8, pp. 257- 277 ,(1992) , 10.1007/BF00992697

Scott E. Fahlman, Christian Lebiere, The Cascade-Correlation Learning Architecture neural information processing systems. ,vol. 2, pp. 524- 532 ,(1989)

Geoffrey E. Hinton, Training products of experts by minimizing contrastive divergence Neural Computation. ,vol. 14, pp. 1771- 1800 ,(2002) , 10.1162/089976602760128018

10.

Geoffrey E. Hinton, Michal Rosen-zvi, Max Welling, Exponential Family Harmoniums with an Application to Information Retrieval neural information processing systems. ,vol. 17, pp. 1481- 1488 ,(2004)

Greedy Layer-Wise Training of Deep Networks

来源期刊

我的账户

Greedy Layer-Wise Training of Deep Networks

来源期刊

相似文章 10

我的账户