Memory-efficient groupby-aggregate using compressed buffer trees

作者: Hrishikesh Amur , Wolfgang Richter , David G. Andersen , Michael Kaminsky , Karsten Schwan

DOI: 10.1145/2523616.2523625

关键词:

摘要: The rapid growth of fast analytics systems, that require data processing in memory, makes memory capacity an increasingly-precious resource. This paper introduces a new compressed structure called Compressed Buffer Tree (CBT). Using combination techniques including buffering, compression, and serialization, CBTs improve the efficiency performance GroupBy-Aggregate abstraction forms basis not only batch-processing models like MapReduce, but recent systems too. For streaming workloads, aggregation using CBT uses 21--42% less than Google SparseHash with up to 16% better throughput. is also compared batch-mode aggregators MapReduce runtimes such as Phoenix++ Metis consumes 4x 5x 1.5--2x 3--4x more respectively.

参考文章(4)
Patrick O’Neil, Edward Cheng, Dieter Gawlick, Elizabeth O’Neil, The log-structured merge-tree (LSM-tree) Acta Informatica. ,vol. 33, pp. 351- 385 ,(1996) , 10.1007/S002360050048
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly, Dryad: distributed data-parallel programs from sequential building blocks european conference on computer systems. ,vol. 41, pp. 59- 72 ,(2007) , 10.1145/1272996.1273005
A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M. A. DePristo, The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data Genome Research. ,vol. 20, pp. 1297- 1303 ,(2010) , 10.1101/GR.107524.110
Jeffrey Dean, Sanjay Ghemawat, MapReduce Communications of the ACM. ,vol. 51, pp. 107- 113 ,(2008) , 10.1145/1327452.1327492