Join synopses for approximate query answering

作者: Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy

DOI: 10.1145/304181.304207

关键词:

摘要: In large data warehousing environments, it is often advantageous to provide fast, approximate answers complex aggregate queries based on statistical summaries of the full data. this paper, we demonstrate difficulty providing good for join-queries using only statistics (in particular, samples) from base relations. We propose join synopses as an effective solution problem and show how precomputing just one synopsis each relation suffices significantly improve quality arbitrary with foreign key joins. present optimal strategies allocating available space among various when query work load known identify heuristics common case not known. also efficient algorithms incrementally maintaining in presence updates Our extensive set experiments TPC-D benchmark database effectiveness other techniques proposed paper.

参考文章(30)
Viswanath Poosala, Histogram-based estimation techniques in database systems University of Wisconsin at Madison. ,(1997)
Lynne Stokes, Jeffrey F. Naughton, Peter J. Haas, S. Seshadri, Sampling-Based Estimation of the Number of Distinct Values of an Attribute very large data bases. pp. 311- 322 ,(1995)
F. Olken, D. Rotem, Maintenance of materialized views of sampling queries international conference on data engineering. pp. 632- 641 ,(1992) , 10.1109/ICDE.1992.213145
Yossi Matias, Neal E. Young, S. Cenk Sahinalp, Performance evaluation of approximate priority queues ,(1996)
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider, Practical selectivity estimation through adaptive sampling international conference on management of data. ,vol. 19, pp. 1- 11 ,(1990) , 10.1145/93597.93611
R.J. Lipton, J.F. Naughton, Query Size Estimation by Adaptive Sampling Journal of Computer and System Sciences. ,vol. 51, pp. 18- 25 ,(1995) , 10.1006/JCSS.1995.1050
Wen-Chi Hou, Gultekin Ozsoyoglu, Baldeo K. Taneja, Statistical estimators for relational algebra expressions symposium on principles of database systems. pp. 276- 287 ,(1988) , 10.1145/308386.308455
Phillip B. Gibbons, Yossi Matias, New sampling-based summary statistics for improving approximate query answers Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 331- 342 ,(1998) , 10.1145/276304.276334
Yossi Matias, Phillip B. Gibbons, Synopsis data structures for massive data sets symposium on discrete algorithms. pp. 909- 910 ,(1999)
Noga Alon, Yossi Matias, Mario Szegedy, The space complexity of approximating the frequency moments symposium on the theory of computing. pp. 20- 29 ,(1996) , 10.1145/237814.237823