The Ebb and Flow of Deep Learning: a Theory of Local Learning.

作者: Pierre Baldi , Peter J. Sadowski

DOI:

关键词:

摘要: In a physical neural system, where storage and processing are intimately intertwined, the rules for adjusting synaptic weights can only depend on variables that available locally, such as activity of pre- post-synaptic neurons, resulting in local learning rules. A systematic framework studying space must first define nature variables, then functional form ties them together into each rule. We consider polynomial analyze their behavior capabilities both linear non-linear networks. As byproduct, this enables also discovery new well important relationships between group symmetries. Stacking deep feedforward networks leads to learning. While learn interesting representations, it cannot complex input-output functions, even when targets top layer. Learning functions requires target information is propagated layers through backward channel. The about targets, channel which propagated, partition algorithms. For any algorithm, capacity be defined number bits provided gradient per weight, divided by required operations weight. estimate associated with several algorithms show backpropagation outperforms achieves maximum possible capacity. theory clarifies concept Hebbian learning, what learnable explains sparsity discovered so far.

参考文章(27)
David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, Learning representations by back-propagating errors Nature. ,vol. 323, pp. 696- 699 ,(1988) , 10.1038/323533A0
B. WIDROW, M. E. HOFF, Adaptive switching circuits Neurocomputing: foundations of research. pp. 123- 134 ,(1988) , 10.21236/AD0241531
Pierre Priouret, Michel Métivier, Albert Benveniste, Adaptive Algorithms and Stochastic Approximations ,(1990)
H. D. Block, S. A. Levin, On the boundedness of an iterative procedure for solving a system of linear inequalities Proceedings of the American Mathematical Society. ,vol. 26, pp. 229- 235 ,(1970) , 10.1090/S0002-9939-1970-0265383-5
P. Baldi, Boolean autoencoders and hypercube clustering complexity Designs, Codes and Cryptography. ,vol. 65, pp. 383- 403 ,(2012) , 10.1007/S10623-012-9719-X
V. F. Zaĭt︠s︡ev, A. D. Poli︠a︡nin, Handbook of Exact Solutions for Ordinary Differential Equations ,(2002)
Saburo Muroga, Lower Bounds of the Number of Threshold Functions and a Maximum Weight IEEE Transactions on Electronic Computers. ,vol. 14, pp. 136- 148 ,(1965) , 10.1109/PGEC.1965.263958
P. Baldi, Neural networks, orientations of the hypercube, and algebraic threshold functions IEEE Transactions on Information Theory. ,vol. 34, pp. 523- 530 ,(1988) , 10.1109/18.6032