CuMF_SGD: Fast and Scalable Matrix Factorization

作者: Wei Tan , Liana L. Fong , Yun Liang , Xiaolong Xie

DOI:

关键词:

摘要: Matrix factorization (MF) has been widely used in e.g., recommender systems, topic modeling and word embedding. Stochastic gradient descent (SGD) is popular solving MF problems because it can deal with large data sets easy to do incremental learning. We observed that SGD for memory bound. Meanwhile, single-node CPU systems caching performs well only small sets; distributed have higher aggregated bandwidth but suffer from relatively slow network connection. This observation inspires us accelerate by utilizing GPUs's high fast intra-node present cuMF_SGD, a CUDA-based solution large-scale problems. On single CPU, we design two workload schedule schemes, i.e., batch-Hogwild! wavefront-update fully exploit the massive amount of cores. Especially, as vectorized version Hogwild! overcomes issue discontinuity. also develop highly-optimized kernels update, leveraging cache, warp-shuffle instructions half-precision floats. partition scheme utilize multiple GPUs while addressing well-known convergence when parallelizing SGD. three one Maxwell or Pascal GPU, cuMF_SGD runs 3.1X-28.2X compared state-of-art solutions on 1-64 nodes. Evaluations show scales sets.

参考文章(29)
David Zastrau, Stefan Edelkamp, None, Stochastic Gradient Descent with GPGPU Lecture Notes in Computer Science. pp. 193- 204 ,(2012) , 10.1007/978-3-642-33347-7_17
Sebastian Schelter, Venu Satuluri, Reza Zadeh, None, Factorbird - a Parameter Server Approach to Distributed Matrix Factorization. arXiv: Learning. ,(2014)
Yusuke Nishioka, Kenjiro Taura, Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures international parallel and distributed processing symposium. pp. 1178- 1184 ,(2015) , 10.1109/IPDPSW.2015.135
Gregory R. Ganger, Eric P. Xing, Jinliang Wei, Seunghak Lee, James Cipar, Abhimanu Kumar, Phillip B. Gibbons, Henggang Cui, Qirong Ho, Wei Dai, Garth A. Gibson, Jin Kyu Kim, Exploiting bounded staleness to speed up big data analytics usenix annual technical conference. pp. 37- 48 ,(2014)
Rainer Gemulla, Erik Nijkamp, Peter J. Haas, Yannis Sismanis, Large-scale matrix factorization with distributed stochastic gradient descent Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. pp. 69- 77 ,(2011) , 10.1145/2020408.2020426
Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, Wen-mei W. Hwu, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08. pp. 73- 82 ,(2008) , 10.1145/1345206.1345220
Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, Inderjit Dhillon, Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems 2012 IEEE 12th International Conference on Data Mining. pp. 765- 774 ,(2012) , 10.1109/ICDM.2012.168
Cho-Jui Hsieh, Inderjit S. Dhillon, Fast coordinate descent methods with variable selection for non-negative matrix factorization Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. pp. 1064- 1072 ,(2011) , 10.1145/2020408.2020577
Jinoh Oh, Wook-Shin Han, Hwanjo Yu, Xiaoqian Jiang, Fast and Robust Parallel SGD Matrix Factorization knowledge discovery and data mining. pp. 865- 874 ,(2015) , 10.1145/2783258.2783322