Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

作者： Mengdi Wang , Ethan X. Fang , Han Liu

关键词:

摘要: Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or composition two functions, i.e., problem $$\min _x \mathbf{E}_v\left[ f_v\big (\mathbf{E}_w [g_w(x)]\big ) \right] .$$minxEvfv(Ew[gw(x)]). In order solve this problem, we propose class compositional descent (SCGD) algorithms that can be viewed as versions quasi-gradient method. SCGD update solutions based on noisy sample gradients $$f_v,g_{w}$$fv,gw and use an auxiliary variable track unknown quantity $$\mathbf{E}_w\left[ g_w(x)\right] $$Ewgw(x). We prove converge almost surely optimal solution convex optimization problems, long such exists. The convergence involves interplay iterations with different time scales. For nonsmooth achieves rate $$\mathcal {O}(k^{-1/4})$$O(k-1/4) in general case {O}(k^{-2/3})$$O(k-2/3) strongly case, after taking k samples. smooth accelerated at {O}(k^{-2/7})$$O(k-2/7) {O}(k^{-4/5})$$O(k-4/5) case. nonconvex any limit point generated by is stationary point, which also provide analysis. Indeed, setting where one wants optimize compositions functions very common practice. proposed find wide applications learning, estimation, dynamic programming, etc.

参考文章(35)

Arkadi Nemirovski, Reuven Y. Rubinstein, An Efficient Stochastic Approximation Algorithm for Stochastic Saddle Point Problems Springer, New York, NY. pp. 156- 184 ,(2005) , 10.1007/0-306-48102-2_8

Vivek S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint ,(2008)

Harold J. Kushner, G. George Yin, Stochastic Approximation and Recursive Algorithms and Applications ,(2009)

John Darzentas, Problem complexity and method efficiency in optimization ,(1983)

A. Nedić, D.P. Bertsekas, V.S. Borkar, Distributed Asynchronous Incremental Subgradient Methods Studies in Computational Mathematics. ,vol. 8, pp. 381- 407 ,(2001) , 10.1016/S1570-579X(01)80023-9

Ohad Shamir, Tong Zhang, Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes international conference on machine learning. pp. 71- 79 ,(2013)

Darinka Dentcheva, Alexander Shapiro, Andrzej P. Ruszczyński, Lectures on Stochastic Programming: Modeling and Theory ,(2009)

Pierre Priouret, Michel Métivier, Albert Benveniste, Adaptive Algorithms and Stochastic Approximations ,(1990)

H. Robbins, D. Siegmund, A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications Optimizing Methods in Statistics#R##N#Proceedings of a Symposium Held at the Center for Tomorrow, the Ohio State University, June 14–16, 1971. pp. 111- 135 ,(1985) , 10.1007/978-1-4612-5110-1_10

10.

John N. Tsitsiklis, Dimitri P. Bertsekas, Parallel and Distributed Computation: Numerical Methods ,(1989)

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

来源期刊

我的账户

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

来源期刊

相似文章 10

我的账户