作者: Yoshua Bengio , Hugo Larochelle , Pascal Lamblin , Dan Popovici
DOI:
关键词:
摘要: Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms computational elements required to represent some functions. Deep multi-layer neural networks have many levels non-linearities allowing them compactly highly non-linear and highly-varying However, until recently it was not clear how train such networks, since gradient-based optimization starting from random initialization appears often get stuck poor solutions. Hinton et al. introduced a greedy layer-wise unsupervised learning algorithm for Belief Networks (DBN), generative model with layers hidden causal variables. In the context above problem, we study this empirically explore variants better understand its success extend cases where inputs are continuous or structure input distribution is revealing enough about variable predicted supervised task. Our experiments also confirm hypothesis training strategy mostly helps optimization, by initializing weights region near good local minimum, giving rise internal distributed representations high-level abstractions input, bringing generalization.