A learning-based approach to estimate statistics of operators in continuous queries

作者: Like Gao , Min Wang , X. Sean Wang , Sriram Padmanabhan

DOI: 10.1145/882082.882097

关键词:

摘要: Statistic estimation such as output size of operators is a well-studied subject in the database research community, mainly for purpose query optimization. The assumption, however, that queries are ad-hoc and therefore emphasis has been on capturing data distribution. When long standing continuous changing concerned, more direct approach, namely building an model each operator, possible. In this paper, we propose novel learning-based method. Our method consists two steps. first step to design dedicated feature extraction algorithm can be used incrementally obtain values from underlying data. second use mining generate based extracted historical To illustrate paper studies case similarity-based searches over streaming time series. Experimental results show approach provides accurate statistic estimates with low overhead.

参考文章(27)
Rakesh Agrawal, Christos Faloutsos, Arun Swami, None, Efficient Similarity Search In Sequence Databases FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms. pp. 69- 84 ,(1993) , 10.1007/3-540-57301-1_5
Johannes Gehrke, Minos N. Garofalakis, Querying and Mining Data Streams: You Only Get One Look. very large data bases. ,(2002)
Harpreet S. Sawhney, King-Ip Lin, Kyuseok Shim, Rakesh Agrawal, Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases very large data bases. pp. 490- 501 ,(1995)
Ling Liu, C. Pu, R. Barga, Tong Zhou, Differential evaluation of continual queries international conference on distributed computing systems. pp. 458- 465 ,(1996) , 10.1109/ICDCS.1996.507994
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider, Practical selectivity estimation through adaptive sampling international conference on management of data. ,vol. 19, pp. 1- 11 ,(1990) , 10.1145/93597.93611
Like Gao, Zhengrong Yao, X. Sean Wang, Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching conference on information and knowledge management. pp. 485- 492 ,(2002) , 10.1145/584792.584872
Minos Garofalakis, Johannes Gehrke, Rajeev Rastogi, Querying and mining data streams Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02. pp. 635- 635 ,(2002) , 10.1145/564691.564794
Banchong Harangsri, John Shepherd, Anne Ngu, None, Query Size Estimation Using Machine Learning database systems for advanced applications. pp. 97- 106 ,(1997) , 10.1142/9789812819536_0011
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom, Models and issues in data stream systems symposium on principles of database systems. pp. 1- 16 ,(2002) , 10.1145/543613.543615
Douglas Terry, David Goldberg, David Nichols, Brian Oki, Continuous queries over append-only databases international conference on management of data. ,vol. 21, pp. 321- 330 ,(1992) , 10.1145/130283.130333