SparkBench : a spark benchmarking suite characterizing large-scale in-memory data analytics

作者： Min Li , Jian Tan , Yandong Wang , Li Zhang , Valentina Salapura

关键词:

摘要: Spark has been increasingly employed by industries for big data analytics recently, due to its resilience, scalability and efficient in-memory distributed programming model. Meanwhile, the rapid growing community is also actively incubating a rich ecosystem around tackle various challenges. The current benchmarks fall short in providing guidance of development, optimization, configuration deployment Spark. To this end, we introduce SparkBench, specific benchmarking suite. It selectively embraces set representative applications identify performance bottlenecks reveals resource consumption behaviors across execution phases. Overall, SparkBench covers four critical usage patterns Spark, including machine learning, graph processing, stream computations SQL query processing. We present comprehensive characterization consumptions, flows timing information under different demonstrate that can effectively guide optimization analytic platforms better suit workloads.

参考文章(27)

Omar Batarfi, Radwa El Shawi, Ayman G. Fayoumi, Reza Nouri, Seyed-Mehdi-Reza Beheshti, Ahmed Barnawi, Sherif Sakr, Large scale graph processing systems: survey and an experimental evaluation Cluster Computing. ,vol. 18, pp. 1189- 1213 ,(2015) , 10.1007/S10586-015-0472-6

Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning Internet Mathematics. ,vol. 8, pp. 161- 185 ,(2012) , 10.1080/15427951.2012.625260

Zijian Ming, Chunjie Luo, Wanling Gao, Rui Han, Qiang Yang, Lei Wang, Jianfeng Zhan, BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking Workshop on Big Data Benchmarks. pp. 138- 154 ,(2013) , 10.1007/978-3-319-10596-3_11

Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun, VMware ICSI, Making sense of performance in data analytics frameworks networked systems design and implementation. pp. 293- 307 ,(2015)

Rajeev Motwani, Terry Winograd, Lawrence Page, Sergey Brin, The PageRank Citation Ranking : Bringing Order to the Web the web conference. ,vol. 98, pp. 161- 172 ,(1999)

Wen Xiong, Zhibin Yu, Zhendong Bei, Juanjuan Zhao, Fan Zhang, Yubin Zou, Xue Bai, Ye Li, Chengzhong Xu, A characterization of big data benchmarks international conference on big data. pp. 118- 125 ,(2013) , 10.1109/BIGDATA.2013.6691707

Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears, Benchmarking cloud serving systems with YCSB Proceedings of the 1st ACM symposium on Cloud computing - SoCC '10. pp. 143- 154 ,(2010) , 10.1145/1807128.1807152

Yehuda Koren, Factorization meets the neighborhood Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 426- 434 ,(2008) , 10.1145/1401890.1401944

Min Li, Jian Tan, Yandong Wang, Li Zhang, Valentina Salapura, SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark computing frontiers. pp. 53- ,(2015) , 10.1145/2742854.2747283

10.

Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, Hans-Arno Jacobsen, BigBench: towards an industry standard benchmark for big data analytics international conference on management of data. pp. 1197- 1208 ,(2013) , 10.1145/2463676.2463712

SparkBench : a spark benchmarking suite characterizing large-scale in-memory data analytics

来源期刊

我的账户

SparkBench : a spark benchmarking suite characterizing large-scale in-memory data analytics

来源期刊

相似文章 10

我的账户