作者: Neta Barkay , Ely Porat , Bar Shalem
DOI: 10.1007/978-3-642-40164-0_8
关键词:
摘要: We study the problem of generating a large sample from data stream elements (i,v), where consists pairs (i,Ci) for Ci=∑(i,v)∈streamv. consider strict turnstile streams and general non-strict streams, in which Ci may be negative. Our is useful approximating both forward inverse distribution statistics, within an additive error e provable success probability 1−δ. Our sampling method improves by order magnitude known processing time each element, crucial factor applications, thereby providing feasible solution to problem. For example, size O(e−2 log(1/δ)) our requires O((loglog(1/e))2+(loglog(1/δ)) 2) operations per whereas best previous log2(1/δ)) evaluations fully independent hash function element. We achieve this improvement constructing efficient K-elements recovery structure K can extracted with 1−δ. enables algorithm run on distributed systems extract statistics difference between streams.