作者: Suresh Venkatasubramanian , Andrew McGregor , Sudipto Guha
关键词: Bounded function 、 Hellinger distance 、 Sublinear function 、 Mathematics 、 Binary logarithm 、 Upper and lower bounds 、 Entropy (information theory) 、 Discrete mathematics 、 Property testing 、 Random permutation
摘要: In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard lp norms. this paper we design streaming and sublinear time property testing algorithms for entropy various distances.Batu et al posed the problem of with respect to Jensen-Shannon distance. We present optimal estimating bounded, symmetric f-divergences (including divergence Hellinger distance) between distributions in frameworks. Along way, close a (log n)/H gap upper lower bounds H, yielding an algorithm over all values entropy. data stream setting (sublinear space), give first distribution. Our runs polylogarithmic space yields asymptotic constant factor approximation scheme. An integral part is interesting use F0 (the number distinct elements set) estimation algorithm; also provide other results along space/time/approximation tradeoff curve.Our have structural implications that connect constrained algorithms. The mediating model random order model, assumes input permutation multiset was considered by Munro Paterson 1980. show any combined oracle calculating invariant functions can be simulated single pass. This addresses question raised Feigenbaum regarding relationship Further, polylog-space PTAS one pass stream. bound cannot achieved (generalized testing) model.