Efficient Sampling of Non-strict Turnstile Data Streams

作者: Neta Barkay , Ely Porat , Bar Shalem

DOI: 10.1007/978-3-642-40164-0_8

关键词:

摘要: We study the problem of generating a large sample from data stream elements (i,v), where consists pairs (i,Ci) for Ci=∑(i,v)∈streamv. consider strict turnstile streams and general non-strict streams, in which Ci may be negative. Our is useful approximating both forward inverse distribution statistics, within an additive error e provable success probability 1−δ. Our sampling method improves by order magnitude known processing time each element, crucial factor applications, thereby providing feasible solution to problem. For example, size O(e−2 log(1/δ)) our requires O((loglog(1/e))2+(loglog(1/δ)) 2) operations per whereas best previous log2(1/δ)) evaluations fully independent hash function element. We achieve this improvement constructing efficient K-elements recovery structure K can extracted with 1−δ. enables algorithm run on distributed systems extract statistics difference between streams.

参考文章(30)
Ely Porat, Ohad Lipsky, Improved Sketching of Hamming Distance with Error Correcting Combinatorial Pattern Matching. pp. 173- 182 ,(2007) , 10.1007/978-3-540-73437-6_19
Joachim Von Zur Gathen, Jurgen Gerhard, Modern Computer Algebra ,(1999)
Mayur Datar, S. Muthukrishnan, Estimating Rarity and Similarity over Data Stream Windows european symposium on algorithms. pp. 323- 334 ,(2002) , 10.1007/3-540-45749-6_31
Moni Naor, Vanessa Teague, Anti-presistence Proceedings of the thirty-third annual ACM symposium on Theory of computing - STOC '01. pp. 492- 501 ,(2001) , 10.1145/380752.380844
I. Gohberg, V. Olshevsky, Fast Algorithms with Preprocessing for Matrix-Vector Multiplication Problems Journal of Complexity. ,vol. 10, pp. 411- 427 ,(1994) , 10.1006/JCOM.1994.1021
Piotr Indyk, A small approximately min-wise independent family of hash functions symposium on discrete algorithms. ,vol. 38, pp. 454- 456 ,(1999) , 10.1006/JAGM.2000.1131
Devdatt Dubhashi, Desh Ranjan, Balls and bins: a study in negative dependence Random Structures and Algorithms. ,vol. 13, pp. 99- 124 ,(1998) , 10.1002/(SICI)1098-2418(199809)13:2<99::AID-RSA1>3.0.CO;2-M
Anna Pagh, Rasmus Pagh, Uniform Hashing in Constant Time and Optimal Space SIAM Journal on Computing. ,vol. 38, pp. 85- 96 ,(2008) , 10.1137/060658400
Daniele Micciancio, Oblivious data structures: applications to cryptography symposium on the theory of computing. pp. 456- 464 ,(1997) , 10.1145/258533.258638