Measuring $k$-Wise Independence of Streaming Data

作者: Vladimir Braverman , Rafail Ostrovsky

DOI:

关键词:

摘要: In their seminal work, Alon, Matias, and Szegedy introduced several sketching techniques, including showing that 4-wise independence is sufficient to obtain good approximations of the second frequency moment. this we show technique can be extended product domains $[n]^k$ by using independent functions on $[n]$. Our work extends Indyk McGregor, who showed result for $k = 2$. Their primary motivation was problem identifying correlations in data streams. model, a stream pairs $(i,j) \in [n]^2$ arrive, giving joint distribution $(X,Y)$, they find approximation algorithms how close marginal distributions under various metrics, which naturally corresponds $X$ $Y$ are being independent. By our technique, new approximating $\ell_2$ distance between $k$-ary vectors, instead just pairs, single pass. analysis gives randomized algorithm $(1 \pm \epsilon)$ (with probability $1-\delta$) requires space logarithmic $n$ $m$ proportional $3^k$.

参考文章(29)
T. Batu, E. Fischer, L. Fortnow, R. Kumar, R. Rubinfeld, P. White, Testing random variables for independence and identity international conference on cluster computing. pp. 442- 451 ,(2001) , 10.1109/SFCS.2001.959920
Charu C Aggarwal, None, Data Streams: Models and Algorithms Springer Publishing Company, Incorporated. ,(2014)
Sudipto Guha, Piotr Indyk, Andrew McGregor, Sketching information divergences conference on learning theory. pp. 424- 438 ,(2007) , 10.1007/978-3-540-72927-3_31
Noga Alon, László Babai, Alon Itai, A fast and simple randomized parallel algorithm for the maximal independent set problem Journal of Algorithms. ,vol. 7, pp. 567- 583 ,(1985) , 10.1016/0196-6774(86)90019-2
Piotr Indyk, Andrew McGregor, Declaring independence via the sketching of sketches symposium on discrete algorithms. pp. 737- 745 ,(2008)
Kevin L. Chang, Ravi Kannan, The space complexity of pass-efficient algorithms for clustering symposium on discrete algorithms. pp. 1157- 1166 ,(2006) , 10.5555/1109557.1109685
Don Coppersmith, Ravi Kumar, An improved data stream algorithm for frequency moments symposium on discrete algorithms. pp. 151- 156 ,(2004) , 10.5555/982792.982815
Noga Alon, Oded Goldreich, Yishay Mansour, Almost k -wise independence versus k -wise independence Information Processing Letters. ,vol. 88, pp. 107- 110 ,(2003) , 10.1016/S0020-0190(03)00359-4
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom, Models and issues in data stream systems symposium on principles of database systems. pp. 1- 16 ,(2002) , 10.1145/543613.543615