A meta-heuristic approach for RLE compression in a column store table

作者: Jane Jovanovski , Nino Arsov , Evgenija Stevanoska , Maja Siljanoska Simons , Goran Velinov

DOI: 10.1007/S00500-018-3081-5

关键词:

摘要: Structured data are one of the most important segments in realm big analysis that have undeniably prevailed over years. In recent years, column-oriented design has become a frequent practice to organize structured analytical systems. The storage systems column-wise manner often referred as column stores. Column-oriented databases or warehouses and spreadsheet applications particular recently popular convenient tool for processing analysis. At same time, volume is increasing at an extreme rate, which despite decrease pricing stresses importance compression. Apart from resounding performance gain large read-mostly repositories, easily compressible, enables efficient query pushes peak overall performance. Many compression algorithms, including Run Length Encoding (RLE), exploit similarity among values, where repetitions value form columnar runs can be found database This paper presents comprehensive comparison common well-known meta-heuristics run minimization, based on standard implementations by using real datasets. We analyzed genetic simulated annealing, cuckoo search, particle swarm optimization, Tabu bat algorithm. first three being undergone sensitivity synthetic datasets fine-tune their parameters. These were then tested experiments show algorithms perform consistently well both data, demonstrating higher run-reduction efficiency compared existing approaches. Moreover, results applied exhibit quick convergence nearly optimal solutions, accompanied insignificant overhead. addition, we provide heuristic RLE approaches optimization methods. They effective physical extent makes them suitable everyday appliances. also indicate our able overcome expected on-disk file ratio, cases better than respective reduction runs.

参考文章(55)
Kwang Y. Lee, Mohamed A. El-Sharkawi, Modern heuristic optimization techniques :: theory and applications to power systems IEEE Press , John Wiley. ,(2008)
Jane Jovanovski, Maja Siljanoska, Goran Velinov, A Genetic Algorithm Approach for Minimizing the Number of Columnar Runs in a Column Store Table Adaptive and Natural Computing Algorithms. pp. 485- 494 ,(2013) , 10.1007/978-3-642-37213-1_50
Evolutionary Computation 1 IOP Publishing Ltd. ,(2000) , 10.1887/0750306645
Gai-Ge Wang, Suash Deb, Amir H. Gandomi, Zhaojun Zhang, Amir H. Alavi, Chaotic cuckoo search soft computing. ,vol. 20, pp. 3349- 3362 ,(2016) , 10.1007/S00500-015-1726-1
Marco Dorigo, Mauro Birattari, Thomas Stutzle, Ant colony optimization: artificial ants as a computational intelligence technique IEEE Computational Intelligence Magazine. ,vol. 1, pp. 28- 39 ,(2006) , 10.1109/CI-M.2006.248054
Todd Eavis, David Cueva, A hilbert space compression architecture for data warehouse environments data warehousing and knowledge discovery. pp. 1- 12 ,(2007) , 10.1007/978-3-540-74553-2_1
S. N. Deepa, S. N. Sivanandam, Introduction to genetic algorithms ,(2007)
M. Birattari, T. Stutzle, M. Dorigo, Ant Colony Optimization ,(2004)
Dixin Tang, Taoying Liu, Rubao Lee, Hong Liu, Wei Li, A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling international conference on cluster computing. pp. 70- 73 ,(2015) , 10.1109/CLUSTER.2015.19