Selectivity and Cost Estimation for Joins Based on Random Sampling

作者: Peter J. Haas , Jeffrey F. Naughton , S. Seshadri , Arun N. Swami

DOI: 10.1006/JCSS.1996.0041

关键词:

摘要: We compare the performance of sampling-based procedures for estimating selectivity a join. While some have been proposed in database literature, their relative has never analyzed. A main result this paper is partial ordering that compares variability estimators different after an arbitrary fixed number sampling steps. Prior to current work, it was also unknown whether these fixed-step could be extended fixed-precision are both asymptotically consistent and efficient. Our second general method such extension proof valid all under consideration. show that, plausible assumptions on costs, with respect estimator implies corresponding cost. final collection cost processing join query according plan.

参考文章(33)
Allen Van Gelder, Multiple Join Size Estimation by Virtual Domains. symposium on principles of database systems. pp. 180- 189 ,(1993)
Carlo Zaniolo, Ravi Krishnamurthy, Haran Boral, Optimization of Nonrecursive Queries very large data bases. pp. 128- 137 ,(1986)
Arun Swami, K. Bernhard Schiefer, On the estimation of join result sizes extending database technology. pp. 287- 300 ,(1994) , 10.1007/3-540-57818-8_58
Laura M. Haas, Michael J. Carey, Miron Livny, Amit Shukla, Sing the truth about ad hoc join costs very large data bases. ,vol. 6, pp. 241- 256 ,(1997) , 10.1007/S007780050043
S. Seshadri, Probabilistic methods in query processing University of Wisconsin at Madison. ,(1992)
J. F. Naughton, R. J. Lipton, Estimating the size of generalized transitive closures very large data bases. pp. 165- 171 ,(1989)
Frank Olken, Doron Rotem, Simple Random Sampling from Relational Databases very large data bases. pp. 160- 169 ,(1986)
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri, Practical Skew Handling in Parallel Joins very large data bases. pp. 27- 40 ,(1992)