作者: Jane Jovanovski , Nino Arsov , Evgenija Stevanoska , Maja Siljanoska Simons , Goran Velinov
DOI: 10.1007/S00500-018-3081-5
关键词:
摘要: Structured data are one of the most important segments in realm big analysis that have undeniably prevailed over years. In recent years, column-oriented design has become a frequent practice to organize structured analytical systems. The storage systems column-wise manner often referred as column stores. Column-oriented databases or warehouses and spreadsheet applications particular recently popular convenient tool for processing analysis. At same time, volume is increasing at an extreme rate, which despite decrease pricing stresses importance compression. Apart from resounding performance gain large read-mostly repositories, easily compressible, enables efficient query pushes peak overall performance. Many compression algorithms, including Run Length Encoding (RLE), exploit similarity among values, where repetitions value form columnar runs can be found database This paper presents comprehensive comparison common well-known meta-heuristics run minimization, based on standard implementations by using real datasets. We analyzed genetic simulated annealing, cuckoo search, particle swarm optimization, Tabu bat algorithm. first three being undergone sensitivity synthetic datasets fine-tune their parameters. These were then tested experiments show algorithms perform consistently well both data, demonstrating higher run-reduction efficiency compared existing approaches. Moreover, results applied exhibit quick convergence nearly optimal solutions, accompanied insignificant overhead. addition, we provide heuristic RLE approaches optimization methods. They effective physical extent makes them suitable everyday appliances. also indicate our able overcome expected on-disk file ratio, cases better than respective reduction runs.