Probery: A Probability-based Incomplete Query Optimization for Big Data.

作者： Ge Yu , Yichuan Zhang , Yubin Bao , Jie Song

DOI:

关键词: Data mining 、 Completeness (order theory) 、 NoSQL 、 Online transaction processing 、 Computer science 、 Big data 、 Query optimization

摘要: Nowadays, query optimization has been highly concerned in big data management, especially NoSQL databases. Approximate queries boost performance by loss of accuracy, for example, sampling approaches trade off completeness efficiency. Different from them, we propose an uncertainty completeness, called Probability Completeness (PC short). PC refers to the possibility that results contain all satisfied records. For example PC=0.95, it guarantees there are no more than 5 incomplete among 100 ones, but not how they are. We performance, and experiments show a small doubles performance. The proposed Probery (PROBability-based quERY) adopts accelerate OLTP queries. This paper illustrates probability models, based placement processing, Apache Drill-based implementation Probery. In experiments, first prove percentage complete is larger given confidence various cases, namely guarantee validate. Then compared with Drill, Impala Hive terms indicate performs as fast Drill query, while averagely 1.8x, 1.3x 1.6x faster possible respectively.

参考文章(39)

Minos N. Garofalakis, Phillip B. Gibbon, Approximate Query Processing: Taming the TeraBytes very large data bases. pp. 725- ,(2001)

Albert Kim, Eric Blais, Aditya Parameswaran, Piotr Indyk, Sam Madden, Ronitt Rubinfeld, Rapid sampling for visualizations with ordering guarantees Proceedings of the VLDB Endowment. ,vol. 8, pp. 521- 532 ,(2015) , 10.14778/2735479.2735485

Ron Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection international joint conference on artificial intelligence. ,vol. 2, pp. 1137- 1143 ,(1995)

Verena Kantere, George Orfanoudakis, Anastasios Kementsietsidis, Timos Sellis, Query Relaxation across Heterogeneous Data Sources conference on information and knowledge management. pp. 473- 482 ,(2015) , 10.1145/2806416.2806529

John Klein, Ian Gorton, Neil Ernst, Patrick Donohoe, Kim Pham, Chrisjan Matser, Performance Evaluation of NoSQL Databases: A Case Study Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems. pp. 5- 10 ,(2015) , 10.1145/2694730.2694731

Hina A. Khan, Mohamed A. Sharaf, Abdullah Albarrak, DivIDE: efficient diversification for interactive data exploration statistical and scientific database management. pp. 15- ,(2014) , 10.1145/2618243.2618253

Chris Jermaine, Minos Garofalakis, Peter J. Haas, Graham Cormode, Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches ,(2012)

Brian Babcock, Surajit Chaudhuri, Gautam Das, Dynamic sample selection for approximate query processing international conference on management of data. pp. 539- 550 ,(2003) , 10.1145/872757.872822

Bogdan George Tudorica, Cristian Bucur, A comparison between several NoSQL databases with comments and notes 2011 RoEduNet International Conference 10th Edition: Networking in Education and Research. pp. 1- 5 ,(2011) , 10.1109/ROEDUNET.2011.5993686

10.

Michael Hausenblas, Jacques Nadeau, Apache Drill: Interactive Ad-Hoc Analysis at Scale Big data. ,vol. 1, pp. 100- 104 ,(2013) , 10.1089/BIG.2013.0011

Probery: A Probability-based Incomplete Query Optimization for Big Data.

来源期刊

我的账户

Probery: A Probability-based Incomplete Query Optimization for Big Data.

来源期刊

相似文章 0

我的账户