Approximate continuous querying over distributed streams

作者: Graham Cormode , Minos Garofalakis

DOI: 10.1145/1366102.1366106

关键词:

摘要: While traditional database systems optimize for performance on one-shot query processing, emerging large-scale monitoring applications require continuous tracking of complex data-analysis queries over collections physically distributed streams. Thus, effective solutions have to be simultaneously space/time efficient (at each remote monitor site), communication (across the underlying network), and provide continuous, guaranteed-quality approximate answers. In this paper, we propose novel algorithmic problem continuously a broad class aggregate in such distributed-streams setting. Our schemes maintain answers with provable error guarantees, while optimizing storage space processing time at site, cost across network. nutshell, our algorithms rely general-purpose randomized sketch summaries local streams sites along concise prediction models site behavior order produce highly communication- space/time-efficient solutions. The end result is powerful framework that readily incorporates several analysis (including join multi-join aggregates, wavelet representations), thus giving first known low-overhead solution model. Experiments real data validate approach, revealing significant savings naive as well analytical worst-case guarantees.

参考文章(34)
Moses Charikar, Kevin Chen, Martin Farach-Colton, Finding Frequent Items in Data Streams international colloquium on automata languages and programming. ,vol. 312, pp. 693- 703 ,(2002) , 10.1016/S0304-3975(03)00400-6
Daniela Tulone, Samuel Madden, PAQ: time series forecasting for approximate query answering in sensor networks international conference on embedded wireless systems and networks. pp. 21- 37 ,(2006) , 10.1007/11669463_5
Abhinandan Das, Sumit Ganguly, Minos Garofalakis, Rajeev Rastogi, Distributed set-expression cardinality estimation very large data bases. pp. 312- 323 ,(2004) , 10.1016/B978-012088469-8.50030-9
Mayur Datar, Aristides Gionis, Piotr Indyk, Rajeev Motwani, Maintaining Stream Statistics over Sliding Windows SIAM Journal on Computing. ,vol. 31, pp. 1794- 1813 ,(2002) , 10.1137/S0097539701398363
Piotr Indyk, Aristides Gionis, Rajeev Motwani, Mayur Datar, Maintaining stream statistics over sliding windows: (extended abstract) symposium on discrete algorithms. pp. 635- 644 ,(2002) , 10.5555/545381.545466
Noga Alon, Phillip B. Gibbons, Yossi Matias, Mario Szegedy, Tracking join and self-join sizes in limited storage symposium on principles of database systems. pp. 10- 20 ,(1999) , 10.1145/303976.303978
Michael B. Greenwald, Sanjeev Khanna, Power-conserving computation of order-statistics over sensor networks Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '04. pp. 275- 285 ,(2004) , 10.1145/1055558.1055597
Ankur Jain, Edward Y. Chang, Yuan-Fang Wang, Adaptive stream resource management using Kalman Filters international conference on management of data. pp. 11- 22 ,(2004) , 10.1145/1007568.1007573
Nitin Thaper, Sudipto Guha, Piotr Indyk, Nick Koudas, Dynamic multidimensional histograms Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02. pp. 428- 439 ,(2002) , 10.1145/564691.564741
Noga Alon, Yossi Matias, Mario Szegedy, The space complexity of approximating the frequency moments symposium on the theory of computing. pp. 20- 29 ,(1996) , 10.1145/237814.237823