作者: Hrishikesh Amur , Wolfgang Richter , David G. Andersen , Michael Kaminsky , Karsten Schwan
关键词:
摘要: The rapid growth of fast analytics systems, that require data processing in memory, makes memory capacity an increasingly-precious resource. This paper introduces a new compressed structure called Compressed Buffer Tree (CBT). Using combination techniques including buffering, compression, and serialization, CBTs improve the efficiency performance GroupBy-Aggregate abstraction forms basis not only batch-processing models like MapReduce, but recent systems too. For streaming workloads, aggregation using CBT uses 21--42% less than Google SparseHash with up to 16% better throughput. is also compared batch-mode aggregators MapReduce runtimes such as Phoenix++ Metis consumes 4x 5x 1.5--2x 3--4x more respectively.