A tunable compression framework for bitmap indices

作者: Gheorghi Guzun , Guadalupe Canahuate , David Chiu , Jason Sawin

DOI: 10.1109/ICDE.2014.6816675

关键词:

摘要: Bitmap indices are widely used for large read-only repositories in data warehouses and scientific databases. Their binary representation allows the use of bitwise operations specialized run-length compression techniques. Due to a trade-off between query efficiency, bitmap schemes aligned using fixed encoding length size (typically word length) avoid explicit decompression during time. In general, smaller lengths provide better compression, but require more decoding execution. However, when difference is considerable, it possible encodings also execution We posit that tailored each bit vector will performance than one-size-fits-all approach. present framework optimizes efficiency by allowing bitmaps be compressed variable while still maintaining alignment decompression. Efficient algorithms introduced process queries over different lengths. An input parameter controls aggressiveness providing user with ability tune tradeoff space Our empirical study shows this approach achieves significant improvements terms both time ratio synthetic real sets. Compared 32-bit WAH, VAL-WAH produces up 1.8× times 30% faster.

参考文章(23)
Fabian Corrales, David Chiu, Jason Sawin, Variable Length Compression for Bitmap Indices Lecture Notes in Computer Science. pp. 381- 395 ,(2011) , 10.1007/978-3-642-23091-2_32
Frank Olken, Doron Rotem, Linda Wong, Hsiu-Fen Liu, Harry K. T. Wong, Bit transposed files very large data bases. pp. 448- 457 ,(1985)
Tan Apaydin, Ali Şaman Tosun, Hakan Ferhatosmanoglu, Analysis of Basic Data Reordering Techniques statistical and scientific database management. pp. 517- 524 ,(2008) , 10.1007/978-3-540-69497-7_33
G. Antoshenkov, Byte-aligned bitmap compression data compression conference. pp. 476- 476 ,(1995) , 10.1109/DCC.1995.515586
Kamel Aouiche, Owen Kaser, Daniel Lemire, Sorting improves word-aligned bitmap indexes data and knowledge engineering. ,vol. 69, pp. 3- 28 ,(2010) , 10.1016/J.DATAK.2009.08.006
Francesco Fusco, Marc Ph. Stoecklin, Michail Vlachos, NET-FLi Proceedings of the VLDB Endowment. ,vol. 3, pp. 1382- 1393 ,(2010) , 10.14778/1920841.1921011
Kesheng Wu, Ekow J. Otoo, Arie Shoshani, Optimizing bitmap indices with efficient compression ACM Transactions on Database Systems. ,vol. 31, pp. 1- 38 ,(2006) , 10.1145/1132863.1132864
Alessandro Colantonio, Roberto Di Pietro, Concise: Compressed 'n' Composable Integer Set Information Processing Letters. ,vol. 110, pp. 644- 650 ,(2010) , 10.1016/J.IPL.2010.05.018
Yu Su, Gagan Agrawal, Jonathan Woodring, Kary Myers, Joanne Wendelberger, James Ahrens, Taming massive distributed datasets: data sampling using bitmap indices high performance distributed computing. pp. 13- 24 ,(2013) , 10.1145/2493123.2462906