Reordering rows for better compression

作者: Daniel Lemire , Owen Kaser , Eduardo Gutarra

DOI: 10.1145/2338626.2338633

关键词:

摘要: Sorting database tables before compressing them improves the compression rate. Can we do better than lexicographical orderq For minimizing number of runs in a run-length encoding scheme, best approaches to row-ordering are derived from traveling salesman heuristics, although there is significant trade-off between running time and compression. A new heuristic, Multiple Lists, which variant on Nearest Neighbor that trades off for major running-time speedup, good option very large tables. However, some schemes, it more important generate long rather few runs. this case, another novel Vortex, promising. We find can improve up factor 3 whereas prefix coding by 80p: these gains top due lexicographically sorting table. prove row reordering optimal (within 10p) at identical values within columns, cases.

参考文章(77)
Elaheh Pourabbas, Arie Shoshani, Kesheng Wu, Minimizing Index Size by Reordering Rows and Columns Lecture Notes in Computer Science. pp. 467- 484 ,(2012) , 10.1007/978-3-642-31235-9_31
Ibrahim Kamel, Faloutsos: "hilbert r-tree: an improved r-tree using fractals very large data bases. ,(1994)
Alexander Zeier, Christian Lemke, Kai-Uwe Sattler, Franz Faerber, Speeding up queries in column stores: a case for compression data warehousing and knowledge discovery. pp. 117- 129 ,(2010)
Tom Jenkyns, Ben Stephenson, Searching and Sorting Springer, London. pp. 131- 182 ,(2013) , 10.1007/978-1-4471-4069-6_4
Piotr Indyk, Aristides Gionis, Rajeev Motwani, Similarity Search in High Dimensions via Hashing very large data bases. pp. 518- 529 ,(1999)
Todd Eavis, David Cueva, A hilbert space compression architecture for data warehouse environments data warehousing and knowledge discovery. pp. 1- 12 ,(2007) , 10.1007/978-3-540-74553-2_1
Alistair Moffat, Lang Stuiver, Binary Interpolative Coding for Effective Index Compression Information Retrieval. ,vol. 3, pp. 25- 47 ,(2000) , 10.1023/A:1013002601898
David S. Johnson, Lyle A. McGeoch, Experimental Analysis of Heuristics for the STSP Combinatorial Optimization. pp. 369- 443 ,(2007) , 10.1007/0-306-48213-4_9
Patrick O’Neil, Elizabeth O’Neil, Xuedong Chen, Stephen Revilak, The Star Schema Benchmark and Augmented Fact Table Indexing Lecture Notes in Computer Science. ,vol. 5895, pp. 237- 252 ,(2009) , 10.1007/978-3-642-10424-4_17
G. Antoshenkov, Byte-aligned bitmap compression data compression conference. pp. 476- 476 ,(1995) , 10.1109/DCC.1995.515586