作者: Min Li , Jian Tan , Yandong Wang , Li Zhang , Valentina Salapura
DOI: 10.1007/S10586-016-0723-1
关键词:
摘要: Spark has been increasingly employed by industries for big data analytics recently, due to its resilience, scalability and efficient in-memory distributed programming model. Meanwhile, the rapid growing community is also actively incubating a rich ecosystem around tackle various challenges. The current benchmarks fall short in providing guidance of development, optimization, configuration deployment Spark. To this end, we introduce SparkBench, specific benchmarking suite. It selectively embraces set representative applications identify performance bottlenecks reveals resource consumption behaviors across execution phases. Overall, SparkBench covers four critical usage patterns Spark, including machine learning, graph processing, stream computations SQL query processing. We present comprehensive characterization consumptions, flows timing information under different demonstrate that can effectively guide optimization analytic platforms better suit workloads.