AMS-Sampling in Distributed Monitoring, with Applications to Entropy.

作者: Jiecao Chen , Qin Zhang

DOI:

关键词:

摘要: Modern data management systems often need to deal with massive, dynamic and inherently distributed sources: we collect the using a network, at same time try maintain global view of central coordinator. Such applications have been captured by monitoring model, which has attracted lot attention recently in both theory database communities. However, all proposed algorithms provable guarantees are ad-hoc nature, each being designed for specific problem. In this paper propose first generic algorithmic approach, adapting celebrated AMS-sampling framework from streaming model monitoring. We also show how use monitor entropy functions. Our results significantly improve previous best Arackaparambil et al. [2]

参考文章(30)
Srikanta Tirthapura, David P. Woodruff, Optimal Random Sampling from Distributed Streams Revisited Lecture Notes in Computer Science. ,vol. 6950, pp. 283- 297 ,(2011) , 10.1007/978-3-642-24100-0_27
Chrisil Arackaparambil, Joshua Brody, Amit Chakrabarti, Functional Monitoring without Monotonicity Automata, Languages and Programming. pp. 95- 106 ,(2009) , 10.1007/978-3-642-02927-1_10
Amit Chakrabarti, Khanh Do Ba, S. Muthukrishnan, Estimating entropy and entropy norm on data streams symposium on theoretical aspects of computer science. ,vol. 3, pp. 196- 205 ,(2006) , 10.1007/11672142_15
Phillip B. Gibbons, Srikanta Tirthapura, Distributed streams algorithms for sliding windows Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '02. pp. 63- 72 ,(2002) , 10.1145/564870.564880
Sumit Ganguly, Minos Garofalakis, Rajeev Rastogi, Tracking set-expression cardinalities over continuous update streams very large data bases. ,vol. 13, pp. 354- 369 ,(2004) , 10.1007/S00778-004-0135-3
Graham Cormode, S. Muthukrishnan, Ke Yi, Qin Zhang, Optimal sampling from distributed streams Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems of data - PODS '10. pp. 77- 86 ,(2010) , 10.1145/1807085.1807099
Leslie G. Valiant, A bridging model for parallel computation Communications of the ACM. ,vol. 33, pp. 103- 111 ,(1990) , 10.1145/79173.79181
Zengfeng Huang, Ke Yi, Qin Zhang, Randomized algorithms for tracking distributed count, frequencies, and ranks Proceedings of the 31st symposium on Principles of Database Systems - PODS '12. pp. 295- 306 ,(2012) , 10.1145/2213556.2213596
Suresh Venkatasubramanian, Andrew McGregor, Sudipto Guha, Streaming and sublinear approximation of entropy and information distances symposium on discrete algorithms. pp. 733- 742 ,(2006) , 10.5555/1109557.1109637
Graham Cormode, S. Muthukrishnan, An improved data stream summary: the count-min sketch and its applications Journal of Algorithms. ,vol. 55, pp. 58- 75 ,(2005) , 10.1016/J.JALGOR.2003.12.001