作者: Ge Yu , Yichuan Zhang , Yubin Bao , Jie Song
DOI:
关键词: Data mining 、 Completeness (order theory) 、 NoSQL 、 Online transaction processing 、 Computer science 、 Big data 、 Query optimization
摘要: Nowadays, query optimization has been highly concerned in big data management, especially NoSQL databases. Approximate queries boost performance by loss of accuracy, for example, sampling approaches trade off completeness efficiency. Different from them, we propose an uncertainty completeness, called Probability Completeness (PC short). PC refers to the possibility that results contain all satisfied records. For example PC=0.95, it guarantees there are no more than 5 incomplete among 100 ones, but not how they are. We performance, and experiments show a small doubles performance. The proposed Probery (PROBability-based quERY) adopts accelerate OLTP queries. This paper illustrates probability models, based placement processing, Apache Drill-based implementation Probery. In experiments, first prove percentage complete is larger given confidence various cases, namely guarantee validate. Then compared with Drill, Impala Hive terms indicate performs as fast Drill query, while averagely 1.8x, 1.3x 1.6x faster possible respectively.