clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization

作者: Jing Chen , Jianbin Fang , Weifeng Liu , Tao Tang , Canqun Yang

DOI: 10.1016/J.FUTURE.2018.04.071

关键词: Computer scienceParallel computingMatrix decompositionSolverFactorizationLeverage (statistics)Linear algebraSpeedup

摘要: Abstract Alternating least squares (ALS) has been proved to be an effective solver for matrix factorization in recommender systems. To speed up factorizing performance, various parallel ALS solvers have proposed leverage modern multi-cores and many-cores. Existing implementations are limited either or portability. In this paper, we present efficient portable ( clMF ) On one hand, diagnose the baseline implementation observe that it lacks of awareness hierarchical thread organization on hardware. achieve high apply batching technique, fine-grained tiling technique three architecture-specific optimizations. other implement OpenCL so can run platforms (CPUs, GPUs MICs). Based architectural specifics, select a suitable code variant each platform efficiently map underlying The experimental results show our performs 2.8 × –15.7 faster Intel 16-core CPU, 23.9 –87.9 NVIDIA K20C GPU 34.6 –97.1 AMD Fury X than implementation. GPU, also outperforms cuMF over different latent features ranging from 10 100 with real-world recommendation datasets.

参考文章(62)
Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J Franklin, Reza Zadeh, Matei Zaharia, Ameet Talwalkar, None, MLlib: Machine Learning in Apache Spark arXiv: Learning. ,(2015)
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, Rong Pan, Large-Scale Parallel Collaborative Filtering for the Netflix Prize Algorithmic Aspects in Information and Management. pp. 337- 348 ,(2008) , 10.1007/978-3-540-68880-8_32
Sebastian Schelter, Venu Satuluri, Reza Zadeh, None, Factorbird - a Parameter Server Approach to Distributed Matrix Factorization. arXiv: Learning. ,(2014)
Jianbin Fang, Ana Lucia Varbanescu, Xiangke Liao, Henk Sips, Evaluating vector data type usage in OpenCL kernels Concurrency and Computation: Practice and Experience. ,vol. 27, pp. 4586- 4602 ,(2015) , 10.1002/CPE.3424
Rainer Gemulla, Erik Nijkamp, Peter J. Haas, Yannis Sismanis, Large-scale matrix factorization with distributed stochastic gradient descent Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. pp. 69- 77 ,(2011) , 10.1145/2020408.2020426
Weifeng Liu, Brian Vinter, CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication international conference on supercomputing. pp. 339- 350 ,(2015) , 10.1145/2751205.2751209
Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, Inderjit Dhillon, Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems 2012 IEEE 12th International Conference on Data Mining. pp. 765- 774 ,(2012) , 10.1109/ICDM.2012.168
Jinoh Oh, Wook-Shin Han, Hwanjo Yu, Xiaoqian Jiang, Fast and Robust Parallel SGD Matrix Factorization knowledge discovery and data mining. pp. 865- 874 ,(2015) , 10.1145/2783258.2783322