Histogram-aware sorting for enhanced word-aligned compression in bitmap indexes

作者: Owen Kaser , Daniel Lemire , Kamel Aouiche

DOI: 10.1145/1458432.1458434

关键词:

摘要: Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These are sensitive the order of rows: a simple lexicographical sort can divide index size by 9 make several times faster. We investigate reordering heuristics computed attribute-value histograms. Simply permuting columns table these histograms increase sorting efficiency 40%.

参考文章(19)
Frank Olken, Doron Rotem, Linda Wong, Hsiu-Fen Liu, Harry K. T. Wong, Bit transposed files very large data bases. pp. 448- 457 ,(1985)
Kamel Aouiche, Owen Kaser, Daniel Lemire, Tri de la table de faits et compression des index bitmaps avec alignement sur les mots arXiv: Databases. ,(2008)
Kurt Stockinger, Kesheng Wu, Arie Shoshani, Evaluation Strategies for Bitmap Indices with Binning Lecture Notes in Computer Science. pp. 120- 129 ,(2004) , 10.1007/978-3-540-30075-5_12
G. Antoshenkov, Byte-aligned bitmap compression data compression conference. pp. 476- 476 ,(1995) , 10.1109/DCC.1995.515586
Ladjel Bellatreche, Rokia Missaoui, Hamid Necir, Habiba Drias, Selection and pruning algorithms for bitmap index selection problem using data mining data warehousing and knowledge discovery. pp. 221- 230 ,(2007) , 10.1007/978-3-540-74553-2_20
Chee-Yong Chan, Yannis E. Ioannidis, Bitmap index design and evaluation Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 355- 366 ,(1998) , 10.1145/276304.276336
Kesheng Wu, Ekow J. Otoo, Arie Shoshani, Optimizing bitmap indices with efficient compression ACM Transactions on Database Systems. ,vol. 31, pp. 1- 38 ,(2006) , 10.1145/1132863.1132864
Nick Koudas, Space efficient bitmap indexing Proceedings of the ninth international conference on Information and knowledge management - CIKM '00. pp. 194- 201 ,(2000) , 10.1145/354756.354819
Yashvardhan Sharma, Navneet Goyal, An Efficient Multi-Component Indexing Embedded Bitmap Compression for Data Reorganization Information Technology Journal. ,vol. 7, pp. 160- 164 ,(2008) , 10.3923/ITJ.2008.160.164
Peter D. Turney, Michael L. Littman, Corpus-based Learning of Analogies and Semantic Relations Machine Learning. ,vol. 60, pp. 251- 278 ,(2005) , 10.1007/S10994-005-0913-1