A simple sketching algorithm for entropy estimation over streaming data.

作者: Ioana Cosma , Peter Clifford

DOI:

关键词:

摘要: We consider the problem of approximating empirical Shannon entropy a highfrequency data stream under relaxed strict-turnstile model, when space limitations make exact computation infeasible. An equivalent measure is Renyi that depends on constant α. This quantity can be estimated efficiently and unbiasedly from low-dimensional synopsis called an α-stable sketch via method compressed counting. approximation to obtained by taking α sufficiently close 1. However, practical guidelines for parameter calibration with respect are lacking. avoid this showing random variables used in estimating transformed have proper distributional limit as approaches 1: maximally skewed, strictly stable distribution = 1 defined entire real line. propose family asymptotically unbiased log-mean estimators entropy, indexed ζ > 0, computed single-pass algorithm provide additive approximation. recommend estimator has exponentially decreasing tail bounds error probability, asymptotic relative efficiency 0.932, near-optimal computational complexity. Appearing Proceedings 16 International Conference Artificial Intelligence Statistics (AISTATS) 2013, Scottsdale, AZ, USA. Volume 31 JMLR: W&CP 31. Copyright 2013 authors.

参考文章(24)
V. M. Zolotarev, One-dimensional stable distributions ,(1986)
Charu C Aggarwal, None, Data Streams: Models and Algorithms Springer Publishing Company, Incorporated. ,(2014)
Amit Chakrabarti, Khanh Do Ba, S. Muthukrishnan, Estimating entropy and entropy norm on data streams symposium on theoretical aspects of computer science. ,vol. 3, pp. 196- 205 ,(2006) , 10.1007/11672142_15
Lakshminath Bhuvanagiri, Sumit Ganguly, Estimating Entropy over Data Streams Lecture Notes in Computer Science. pp. 148- 159 ,(2006) , 10.1007/11841036_16
S. Kullback, R. A. Leibler, On Information and Sufficiency Annals of Mathematical Statistics. ,vol. 22, pp. 79- 86 ,(1951) , 10.1214/AOMS/1177729694
Piotr Indyk, Andrew McGregor, Declaring independence via the sketching of sketches symposium on discrete algorithms. pp. 737- 745 ,(2008)
Haiquan (Chuck) Zhao, Ashwin Lall, Mitsunori Ogihara, Oliver Spatscheck, Jia Wang, Jun Xu, A data streaming algorithm for estimating entropies of od flows Proceedings of the 7th ACM SIGCOMM conference on Internet measurement - IMC '07. pp. 279- 290 ,(2007) , 10.1145/1298306.1298345
Vladimir Braverman, Rafail Ostrovsky, Measuring independence of datasets Proceedings of the 42nd ACM symposium on Theory of computing - STOC '10. pp. 271- 280 ,(2010) , 10.1145/1806689.1806728
Ashwin Lall, Vyas Sekar, Mitsunori Ogihara, Jun Xu, Hui Zhang, Data streaming algorithms for estimating entropy of network traffic ACM SIGMETRICS Performance Evaluation Review. ,vol. 34, pp. 145- 156 ,(2006) , 10.1145/1140103.1140295