A Genetic Algorithm Approach for Minimizing the Number of Columnar Runs in a Column Store Table

作者: Jane Jovanovski , Maja Siljanoska , Goran Velinov

DOI: 10.1007/978-3-642-37213-1_50

关键词: Data compressionColumn (database)Similarity (geometry)Genetic algorithmSortingHeuristic (computer science)Table (database)MathematicsAlgorithmData warehouse

摘要: Column-oriented database systems, usually referred to as column stores, organize data in a column-wise manner. Column-wise can be compressed efficiently, improving the performance of large read-mostly repositories such warehouses. Many compression algorithms exploit similarity among values, where repeats same value form columnar runs. In this paper we present genetic algorithm for determining an optimal sorting order which will minimize number runs store table and therefore maximize RLE-based compression. Experiments show that performs consistently well on synthetic instances realistic datasets, resulting with higher run-reduction efficiency compared existing heuristic solving given problem.

参考文章(14)
Kwang Y. Lee, Mohamed A. El-Sharkawi, Modern heuristic optimization techniques :: theory and applications to power systems IEEE Press , John Wiley. ,(2008)
Todd Eavis, David Cueva, A hilbert space compression architecture for data warehouse environments data warehousing and knowledge discovery. pp. 1- 12 ,(2007) , 10.1007/978-3-540-74553-2_1
S. N. Deepa, S. N. Sivanandam, Introduction to genetic algorithms ,(2007)
Daniel Lemire, Owen Kaser, Reordering columns for smaller indexes Information Sciences. ,vol. 181, pp. 2550- 2570 ,(2011) , 10.1016/J.INS.2011.02.002
Daniel J. Abadi, Samuel R. Madden, Nabil Hachem, Column-stores vs. row-stores Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08. pp. 967- 980 ,(2008) , 10.1145/1376616.1376712
Daniel J. Abadi, Peter A. Boncz, Stavros Harizopoulos, Column-oriented database systems very large data bases. ,vol. 2, pp. 1664- 1665 ,(2009) , 10.14778/1687553.1687625
Melanie Mitchell, An Introduction to Genetic Algorithms ,(1996)
Kristian Torp, Rico Wind, Kenneth Houkjær, Simple and realistic data generation very large data bases. pp. 1243- 1246 ,(2006) , 10.5555/1182635.1164254
Daniel Abadi, Samuel Madden, Miguel Ferreira, Integrating compression and execution in column-oriented database systems international conference on management of data. pp. 671- 682 ,(2006) , 10.1145/1142473.1142548
Daniel Lemire, Owen Kaser, Eduardo Gutarra, Reordering rows for better compression ACM Transactions on Database Systems. ,vol. 37, pp. 1- 29 ,(2012) , 10.1145/2338626.2338633