作者: Kai Zeng , Shi Gao , Barzan Mozafari , Carlo Zaniolo
关键词:
摘要: Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)-an area research that now made more critical by need for timely and cost-effective analytics over "Big Data". Assessing quality (i.e., estimating error) approximate answers essential meaningful AQP, two main approaches past to address this problem are based on either (i) analytic error quantification or (ii) bootstrap method. The first approach extremely efficient but lacks generality, whereas second quite general suffers from its high computational overhead. In paper, we introduce a probabilistic relational model process, along with rigorous semantics unified model, which bridges gap between these traditional approaches. Based our framework, develop algorithms predict distribution approximation results. These enable computation any bootstrap-based measure large class SQL queries via single-round evaluation slightly modified query. Extensive experiments both synthetic real-world datasets show method has superior prediction accuracy measures, several orders magnitude faster than bootstrap.