Efficient and Portable ALS Matrix Factorization for Recommender Systems

作者: Jing Chen , Jianbin Fang , Weifeng Liu , Tao Tang , Xuhao Chen

DOI: 10.1109/IPDPSW.2017.91

关键词: Thread (computing)Recommender systemSparse matrixInstruction setSolverSoftware portabilityParallel computingMatrix decompositionComputer scienceSpeedup

摘要: Alternating least squares (ALS) has been proved to be an effective solver of matrix factorization for recommender systems. To speedup factorizing performance, various parallel ALS solvers have proposed leverage modern multi-core CPUs and many-core GPUs/MICs. Existing implementations are limited in either speed or portability (constrained certain platforms). In this paper, we present efficient portable On the one hand, diagnose baseline implementation observe that it lacks awareness hierarchical thread organization on hardware. achieve high apply batching technique three architecture-specific optimizations. other implement OpenCL so can run platforms (CPUs, GPUs, MICs). Based architectural specifics, select a suitable code variant each platform efficiently mapping underlying The experimental results show our performs 5.5 faster 16-core CPU 21.2 K20c than implementation. Our also outperforms cuMF datasets.

参考文章(28)
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, Rong Pan, Large-Scale Parallel Collaborative Filtering for the Netflix Prize Algorithmic Aspects in Information and Management. pp. 337- 348 ,(2008) , 10.1007/978-3-540-68880-8_32
Sebastian Schelter, Venu Satuluri, Reza Zadeh, None, Factorbird - a Parameter Server Approach to Distributed Matrix Factorization. arXiv: Learning. ,(2014)
Jianbin Fang, Ana Lucia Varbanescu, Xiangke Liao, Henk Sips, Evaluating vector data type usage in OpenCL kernels Concurrency and Computation: Practice and Experience. ,vol. 27, pp. 4586- 4602 ,(2015) , 10.1002/CPE.3424
Rainer Gemulla, Erik Nijkamp, Peter J. Haas, Yannis Sismanis, Large-scale matrix factorization with distributed stochastic gradient descent Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. pp. 69- 77 ,(2011) , 10.1145/2020408.2020426
Weifeng Liu, Brian Vinter, CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication international conference on supercomputing. pp. 339- 350 ,(2015) , 10.1145/2751205.2751209
Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, Inderjit Dhillon, Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems 2012 IEEE 12th International Conference on Data Mining. pp. 765- 774 ,(2012) , 10.1109/ICDM.2012.168
Jinoh Oh, Wook-Shin Han, Hwanjo Yu, Xiaoqian Jiang, Fast and Robust Parallel SGD Matrix Factorization knowledge discovery and data mining. pp. 865- 874 ,(2015) , 10.1145/2783258.2783322
Yehuda Koren, Robert Bell, Chris Volinsky, Matrix Factorization Techniques for Recommender Systems IEEE Computer. ,vol. 42, pp. 30- 37 ,(2009) , 10.1109/MC.2009.263
Rashid Kaleem, Sreepathi Pai, Keshav Pingali, Stochastic gradient descent on GPUs acm sigplan symposium on principles and practice of parallel programming. pp. 81- 89 ,(2015) , 10.1145/2716282.2716289
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, Joseph M. Hellerstein, Distributed GraphLab Proceedings of the VLDB Endowment. ,vol. 5, pp. 716- 727 ,(2012) , 10.14778/2212351.2212354