Scaling Up Stochastic Dual Coordinate Ascent

作者： Kenneth Tran , Saghar Hosseini , Lin Xiao , Thomas Finley , Mikhail Bilenko

关键词:

摘要: Stochastic Dual Coordinate Ascent (SDCA) has recently emerged as a state-of-the-art method for solving large-scale supervised learning problems formulated minimization of convex loss functions. It performs iterative, random-coordinate updates to maximize the dual objective. Due sequential nature iterations, it is typically implemented single-threaded algorithm limited in-memory datasets. In this paper, we introduce an asynchronous parallel version algorithm, analyze its convergence properties, and propose solution primal-dual synchronization required achieve in practice. addition, describe scaling out-of-memory datasets via multi-threaded deserialization block-compressed data. This approach yields sufficient pseudo-randomness provide same rate random-order access. Empirical evaluation demonstrates efficiency proposed methods their ability fully utilize computational resources scale

uni-trier.de 本地加速

microsoft.com 本地加速

acm.org 本地加速

microsoft.com PDF 下载加速

uni-trier.de PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(28)

Taiji Suzuki, Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers international conference on machine learning. pp. 736- 744 ,(2014)

Jean-Baptiste Hiriart-Urruty, Claude Lemaréchal, Fundamentals of Convex Analysis ,(2018)

P. Deutsch, J.-L. Gailly, ZLIB Compressed Data Format Specification version 3.3 RFC. ,vol. 1950, pp. 1- 11 ,(1996)

Martin Takáč, Peter Richtárik, Distributed coordinate descent method for learning with big data Journal of Machine Learning Research. ,vol. 17, pp. 2657- 2681 ,(2016)

Shai Shalev-Shwartz, Tong Zhang, Stochastic dual coordinate ascent methods for regularized loss Journal of Machine Learning Research. ,vol. 14, pp. 567- 599 ,(2013)

Léon Bottou, Stochastic Gradient Descent Tricks Neural Networks: Tricks of the Trade (2nd ed.). ,vol. 7700, pp. 421- 436 ,(2012) , 10.1007/978-3-642-35289-8_25

Dong C. Liu, Jorge Nocedal, On the limited memory BFGS method for large scale optimization Mathematical Programming. ,vol. 45, pp. 503- 528 ,(1989) , 10.1007/BF01589116

Yoshua Bengio, James Bergstra, Random search for hyper-parameter optimization Journal of Machine Learning Research. ,vol. 13, pp. 281- 305 ,(2012)

Victor Bittorf, Srikrishna Sridhar, Christopher Re, Steve Wright, Ji Liu, An Asynchronous Parallel Stochastic Coordinate Descent Algorithm international conference on machine learning. pp. 469- 477 ,(2014)

10.

Nicolas L. Roux, Francis R. Bach, Mark Schmidt, A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets neural information processing systems. ,vol. 25, pp. 2663- 2671 ,(2012)

Scaling Up Stochastic Dual Coordinate Ascent

来源期刊

我的账户

Scaling Up Stochastic Dual Coordinate Ascent

来源期刊

相似文章 10

我的账户