SparkBench : a spark benchmarking suite characterizing large-scale in-memory data analytics

作者: Min Li , Jian Tan , Yandong Wang , Li Zhang , Valentina Salapura

DOI: 10.1007/S10586-016-0723-1

关键词:

摘要: Spark has been increasingly employed by industries for big data analytics recently, due to its resilience, scalability and efficient in-memory distributed programming model. Meanwhile, the rapid growing community is also actively incubating a rich ecosystem around tackle various challenges. The current benchmarks fall short in providing guidance of development, optimization, configuration deployment Spark. To this end, we introduce SparkBench, specific benchmarking suite. It selectively embraces set representative applications identify performance bottlenecks reveals resource consumption behaviors across execution phases. Overall, SparkBench covers four critical usage patterns Spark, including machine learning, graph processing, stream computations SQL query processing. We present comprehensive characterization consumptions, flows timing information under different demonstrate that can effectively guide optimization analytic platforms better suit workloads.

参考文章(27)
Omar Batarfi, Radwa El Shawi, Ayman G. Fayoumi, Reza Nouri, Seyed-Mehdi-Reza Beheshti, Ahmed Barnawi, Sherif Sakr, Large scale graph processing systems: survey and an experimental evaluation Cluster Computing. ,vol. 18, pp. 1189- 1213 ,(2015) , 10.1007/S10586-015-0472-6
Zijian Ming, Chunjie Luo, Wanling Gao, Rui Han, Qiang Yang, Lei Wang, Jianfeng Zhan, BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking Workshop on Big Data Benchmarks. pp. 138- 154 ,(2013) , 10.1007/978-3-319-10596-3_11
Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun, VMware ICSI, Making sense of performance in data analytics frameworks networked systems design and implementation. pp. 293- 307 ,(2015)
Rajeev Motwani, Terry Winograd, Lawrence Page, Sergey Brin, The PageRank Citation Ranking : Bringing Order to the Web the web conference. ,vol. 98, pp. 161- 172 ,(1999)
Wen Xiong, Zhibin Yu, Zhendong Bei, Juanjuan Zhao, Fan Zhang, Yubin Zou, Xue Bai, Ye Li, Chengzhong Xu, A characterization of big data benchmarks international conference on big data. pp. 118- 125 ,(2013) , 10.1109/BIGDATA.2013.6691707
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears, Benchmarking cloud serving systems with YCSB Proceedings of the 1st ACM symposium on Cloud computing - SoCC '10. pp. 143- 154 ,(2010) , 10.1145/1807128.1807152
Yehuda Koren, Factorization meets the neighborhood Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 426- 434 ,(2008) , 10.1145/1401890.1401944
Min Li, Jian Tan, Yandong Wang, Li Zhang, Valentina Salapura, SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark computing frontiers. pp. 53- ,(2015) , 10.1145/2742854.2747283
Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, Hans-Arno Jacobsen, BigBench: towards an industry standard benchmark for big data analytics international conference on management of data. pp. 1197- 1208 ,(2013) , 10.1145/2463676.2463712