A characterization of big data benchmarks

作者: Wen Xiong , Zhibin Yu , Zhendong Bei , Juanjuan Zhao , Fan Zhang

DOI: 10.1109/BIGDATA.2013.6691707

关键词:

摘要: Recently, big data has been evolved into a buzzword from academia to industry all over the world. Benchmarks are important tools for evaluating an IT system. However, benchmarking systems is much more challenging than ever before. First, still in their infant stage and consequently they not well understood. Second, complicated compared previous such as single node computing platform. While some researchers started design benchmarks systems, do consider redundancy between benchmarks. Moreover, use artificial input sets rather real world It therefore unclear whether these can be used precisely evaluate performance of systems. In this paper, we first analyze among ICTBench, HiBench typical workloads applications: spatio-temporal analysis Shenzhen transportation Subsequently, present initial idea benchmark suite data. There three findings work: (1) exists pioneering suites them removed safely. (2) The workload behavior trajectory applications dramatically affected by sets. (3) created academic research cannot represent cases applications.

参考文章(10)
Zhen Jia, Runlin Zhou, Chunge Zhu, Lei Wang, Wanling Gao, Yingjie Shi, Jianfeng Zhan, Lixin Zhang, The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems Specifying Big Data Benchmarks. pp. 44- 59 ,(2014) , 10.1007/978-3-642-53974-9_5
Jianfeng Zhan, Wanling Gao, Yong Qi, Shujie Zhang, Xiaona Li, Zhen Jia, Bizhu Qiu, Chunjie Luo, Yongqiang He, Yuqing Zhu, Zhiguo Li, Shiming Gong, Lei Wang, BigDataBench: a Big Data Benchmark Suite from Web Search Engines arXiv: Information Retrieval. ,(2013)
Chunjie Luo, Jianfeng Zhan, Zhen Jia, Lei Wang, Gang Lu, Lixin Zhang, Cheng-Zhong Xu, Ninghui Sun, CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications Frontiers of Computer Science. ,vol. 6, pp. 347- 362 ,(2012) , 10.1007/S11704-012-2118-7
Jianfeng Zhan, Ninghui Sun, Lixin Zhang, Zhen Jia, Chunjie Luo, Lei Wang, High Volume Throughput Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers international parallel and distributed processing symposium. pp. 1712- 1721 ,(2012) , 10.1109/IPDPSW.2012.213
Aashish Phansalkar, Ajay Joshi, Lizy K. John, Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07. ,vol. 35, pp. 412- 423 ,(2007) , 10.1145/1250662.1250713
Huafeng Xi, Jianfeng Zhan, Zhen Jia, Xuehai Hong, Lei Wang, Lixin Zhang, Ninghui Sun, Gang Lu, Characterization of real workloads of web search engines ieee international symposium on workload characterization. pp. 15- 25 ,(2011) , 10.1109/IISWC.2011.6114193
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo, Characterizing data analysis workloads in data centers ieee international symposium on workload characterization. pp. 66- 76 ,(2013) , 10.1109/IISWC.2013.6704671
Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, Bo Huang, The HiBench benchmark suite: Characterization of the MapReduce-based data analysis international conference on data engineering. pp. 41- 51 ,(2010) , 10.1109/ICDEW.2010.5452747
Shengsheng Huang, Jinquan Dai, Bo Huang, Jie Huang, Yan Liu, HiTune: dataflow-based performance analysis for big data cloud ieee international conference on cloud computing technology and science. pp. 24- 24 ,(2011) , 10.5555/2170444.2170468
Jianfeng Zhan, Lixin Zhang, Ninghui Sun, Lei Wang, Zhen Jia, Chunjie Luo, High Volume Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers arXiv: Distributed, Parallel, and Cluster Computing. ,(2012) , 10.1109/IPDPSW.2012.213