作者: Daniel Lemire , Owen Kaser , Eduardo Gutarra
关键词:
摘要: Sorting database tables before compressing them improves the compression rate. Can we do better than lexicographical orderq For minimizing number of runs in a run-length encoding scheme, best approaches to row-ordering are derived from traveling salesman heuristics, although there is significant trade-off between running time and compression. A new heuristic, Multiple Lists, which variant on Nearest Neighbor that trades off for major running-time speedup, good option very large tables. However, some schemes, it more important generate long rather few runs. this case, another novel Vortex, promising. We find can improve up factor 3 whereas prefix coding by 80p: these gains top due lexicographically sorting table. prove row reordering optimal (within 10p) at identical values within columns, cases.