Estimating entropy and entropy norm on data streams

作者: Amit Chakrabarti , Khanh Do Ba , S. Muthukrishnan

DOI: 10.1007/11672142_15

关键词:

摘要: We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with measure we call “entropy norm” an input stream: it is closely related to but structurally similar well-studied notion frequency moments. give polylogarithmic space one-pass algorithm for estimating this norm under certain conditions stream. also prove lower bound that rules out if these do not hold. Our second group results are empirical present problem. For stream m items and given real parameter α, our uses $\tilde{O}(m^{2\alpha})$ provides approximation 1/α in worst case (1+e) “most” cases. then two-pass (1+e)-approximation algorithm. All algorithms quite simple.

参考文章(18)
Theodore Johnson, Oliver Spatscheck, Vladislav Shkapenyuk, Charles D. Cranor, The Gigascope Stream Database. IEEE Data(base) Engineering Bulletin. ,vol. 26, pp. 27- 32 ,(2003)
Robert S. Boyer, J. Strother Moore, MJRTY - A Fast Majority Vote Algorithm. Automated Reasoning: Essays in Honor of Woody Bledsoe. pp. 105- 117 ,(1991) , 10.1007/978-94-011-3488-0_5
Don Coppersmith, Ravi Kumar, An improved data stream algorithm for frequency moments symposium on discrete algorithms. pp. 151- 156 ,(2004) , 10.5555/982792.982815
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom, Models and issues in data stream systems symposium on principles of database systems. pp. 1- 16 ,(2002) , 10.1145/543613.543615
Cristian Estan, George Varghese, New directions in traffic measurement and accounting ACM Transactions on Computer Systems. ,vol. 21, pp. 270- 313 ,(2003) , 10.1145/859716.859719
Theodore Johnson, S. Muthukrishnan, Irina Rozenbaum, Sampling algorithms in a stream operator Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05. pp. 1- 12 ,(2005) , 10.1145/1066157.1066159
Noga Alon, Yossi Matias, Mario Szegedy, The space complexity of approximating the frequency moments symposium on the theory of computing. pp. 20- 29 ,(1996) , 10.1145/237814.237823
Piotr Indyk, David Woodruff, Optimal approximations of the frequency moments of data streams Proceedings of the thirty-seventh annual ACM symposium on Theory of computing - STOC '05. pp. 202- 208 ,(2005) , 10.1145/1060590.1060621
Suresh Venkatasubramanian, Andrew McGregor, Sudipto Guha, Streaming and sublinear approximation of entropy and information distances symposium on discrete algorithms. pp. 733- 742 ,(2006) , 10.5555/1109557.1109637