Greedy Layer-Wise Training of Deep Networks

作者: Yoshua Bengio , Hugo Larochelle , Pascal Lamblin , Dan Popovici

DOI:

关键词:

摘要: Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms computational elements required to represent some functions. Deep multi-layer neural networks have many levels non-linearities allowing them compactly highly non-linear and highly-varying However, until recently it was not clear how train such networks, since gradient-based optimization starting from random initialization appears often get stuck poor solutions. Hinton et al. introduced a greedy layer-wise unsupervised learning algorithm for Belief Networks (DBN), generative model with layers hidden causal variables. In the context above problem, we study this empirically explore variants better understand its success extend cases where inputs are continuous or structure input distribution is revealing enough about variable predicted supervised task. Our experiments also confirm hypothesis training strategy mostly helps optimization, by initializing weights region near good local minimum, giving rise internal distributed representations high-level abstractions input, bringing generalization.

参考文章(16)
Eric Allender, Circuit Complexity before the Dawn of the New Millennium foundations of software technology and theoretical computer science. pp. 1- 18 ,(1996) , 10.1007/3-540-62034-6_33
Régis Lengellé, Thierry Denœux, Training MLPs layer by layer using an objective function for internal representations Neural Networks. ,vol. 9, pp. 83- 97 ,(1996) , 10.1016/0893-6080(95)00096-8
G. Hinton, P Dayan, B. Frey, R. Neal, The "Wake-Sleep" Algorithm for Unsupervised Neural Networks Science. ,vol. 268, pp. 1158- 1161 ,(1995) , 10.1126/SCIENCE.7761831
Geoffrey E Hinton, Ruslan R Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks Science. ,vol. 313, pp. 504- 507 ,(2006) , 10.1126/SCIENCE.1127647
Javier R. Movellan, Paul Mineiro, R. J. Williams, A Monte Carlo EM Approach for partially observable diffusion processes: theory and applications to neural networks Neural Computation. ,vol. 14, pp. 1507- 1544 ,(2002) , 10.1162/08997660260028593
Gerald Tesauro, Practical Issues in Temporal Difference Learning Machine Learning. ,vol. 8, pp. 257- 277 ,(1992) , 10.1007/BF00992697
Scott E. Fahlman, Christian Lebiere, The Cascade-Correlation Learning Architecture neural information processing systems. ,vol. 2, pp. 524- 532 ,(1989)
Geoffrey E. Hinton, Training products of experts by minimizing contrastive divergence Neural Computation. ,vol. 14, pp. 1771- 1800 ,(2002) , 10.1162/089976602760128018
Geoffrey E. Hinton, Michal Rosen-zvi, Max Welling, Exponential Family Harmoniums with an Application to Information Retrieval neural information processing systems. ,vol. 17, pp. 1481- 1488 ,(2004)