作者: Yongjoo Park , Barzan Mozafari , Joseph Sorenson , Junhao Wang
关键词: Correctness 、 Speedup 、 Database 、 Result set 、 SQL 、 Spark (mathematics) 、 Computer science
摘要: Despite 25 years of research in academia, approximate query processing (AQP) has had little industrial adoption. One the major causes this slow adoption is reluctance traditional vendors to make radical changes their legacy codebases, and preoccupation newer (e.g., SQL-on-Hadoop products) with implementing standard features. Additionally, few AQP engines that are available each tied a specific platform require users completely abandon existing databases---an unrealistic expectation given infancy technology. Therefore, we argue universal solution needed: database-agnostic approximation engine will widen reach emerging technology across various platforms. Our proposal, called VerdictDB, uses middleware architecture requires no backend database, thus, can work all off-the-shelf engines. Operating at driver-level, VerdictDB intercepts analytical queries issued database rewrites them into another that, if executed by any relational engine, yield sufficient information for computing an answer. returned result set compute answer error estimates, which then passed on user or application. However, lack access execution layer introduces significant challenges terms generality, correctness, efficiency. This paper shows how overcomes these delivers up 171× speedup (18.45× average) variety engines, such as Impala, Spark SQL, Amazon Redshift, while incurring less than 2.6% relative error. open-sourced under Apache License.