作者: Ioana Cosma , Peter Clifford
DOI:
关键词:
摘要: We consider the problem of approximating empirical Shannon entropy a highfrequency data stream under relaxed strict-turnstile model, when space limitations make exact computation infeasible. An equivalent measure is Renyi that depends on constant α. This quantity can be estimated efficiently and unbiasedly from low-dimensional synopsis called an α-stable sketch via method compressed counting. approximation to obtained by taking α sufficiently close 1. However, practical guidelines for parameter calibration with respect are lacking. avoid this showing random variables used in estimating transformed have proper distributional limit as approaches 1: maximally skewed, strictly stable distribution = 1 defined entire real line. propose family asymptotically unbiased log-mean estimators entropy, indexed ζ > 0, computed single-pass algorithm provide additive approximation. recommend estimator has exponentially decreasing tail bounds error probability, asymptotic relative efficiency 0.932, near-optimal computational complexity. Appearing Proceedings 16 International Conference Artificial Intelligence Statistics (AISTATS) 2013, Scottsdale, AZ, USA. Volume 31 JMLR: W&CP 31. Copyright 2013 authors.