Scaling Up Stochastic Dual Coordinate Ascent

作者: Kenneth Tran , Saghar Hosseini , Lin Xiao , Thomas Finley , Mikhail Bilenko

DOI: 10.1145/2783258.2783412

关键词:

摘要: Stochastic Dual Coordinate Ascent (SDCA) has recently emerged as a state-of-the-art method for solving large-scale supervised learning problems formulated minimization of convex loss functions. It performs iterative, random-coordinate updates to maximize the dual objective. Due sequential nature iterations, it is typically implemented single-threaded algorithm limited in-memory datasets. In this paper, we introduce an asynchronous parallel version algorithm, analyze its convergence properties, and propose solution primal-dual synchronization required achieve in practice. addition, describe scaling out-of-memory datasets via multi-threaded deserialization block-compressed data. This approach yields sufficient pseudo-randomness provide same rate random-order access. Empirical evaluation demonstrates efficiency proposed methods their ability fully utilize computational resources scale

参考文章(28)
Taiji Suzuki, Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers international conference on machine learning. pp. 736- 744 ,(2014)
Jean-Baptiste Hiriart-Urruty, Claude Lemaréchal, Fundamentals of Convex Analysis ,(2018)
P. Deutsch, J.-L. Gailly, ZLIB Compressed Data Format Specification version 3.3 RFC. ,vol. 1950, pp. 1- 11 ,(1996)
Martin Takáč, Peter Richtárik, Distributed coordinate descent method for learning with big data Journal of Machine Learning Research. ,vol. 17, pp. 2657- 2681 ,(2016)
Shai Shalev-Shwartz, Tong Zhang, Stochastic dual coordinate ascent methods for regularized loss Journal of Machine Learning Research. ,vol. 14, pp. 567- 599 ,(2013)
Léon Bottou, Stochastic Gradient Descent Tricks Neural Networks: Tricks of the Trade (2nd ed.). ,vol. 7700, pp. 421- 436 ,(2012) , 10.1007/978-3-642-35289-8_25
Dong C. Liu, Jorge Nocedal, On the limited memory BFGS method for large scale optimization Mathematical Programming. ,vol. 45, pp. 503- 528 ,(1989) , 10.1007/BF01589116
Yoshua Bengio, James Bergstra, Random search for hyper-parameter optimization Journal of Machine Learning Research. ,vol. 13, pp. 281- 305 ,(2012)
Victor Bittorf, Srikrishna Sridhar, Christopher Re, Steve Wright, Ji Liu, An Asynchronous Parallel Stochastic Coordinate Descent Algorithm international conference on machine learning. pp. 469- 477 ,(2014)
Nicolas L. Roux, Francis R. Bach, Mark Schmidt, A Stochastic Gradient Method with an Exponential Convergence _Rate for Finite Training Sets neural information processing systems. ,vol. 25, pp. 2663- 2671 ,(2012)