作者: Graham Cormode , Minos Garofalakis
关键词:
摘要: While traditional database systems optimize for performance on one-shot query processing, emerging large-scale monitoring applications require continuous tracking of complex data-analysis queries over collections physically distributed streams. Thus, effective solutions have to be simultaneously space/time efficient (at each remote monitor site), communication (across the underlying network), and provide continuous, guaranteed-quality approximate answers. In this paper, we propose novel algorithmic problem continuously a broad class aggregate in such distributed-streams setting. Our schemes maintain answers with provable error guarantees, while optimizing storage space processing time at site, cost across network. nutshell, our algorithms rely general-purpose randomized sketch summaries local streams sites along concise prediction models site behavior order produce highly communication- space/time-efficient solutions. The end result is powerful framework that readily incorporates several analysis (including join multi-join aggregates, wavelet representations), thus giving first known low-overhead solution model. Experiments real data validate approach, revealing significant savings naive as well analytical worst-case guarantees.