On Estimating Frequency Moments of Data Streams

作者: Sumit Ganguly , Graham Cormode

DOI: 10.1007/978-3-540-74208-1_35

关键词: Data stream miningMultivariate random variableProduct (mathematics)Distribution (mathematics)EstimatorSpace (mathematics)MathematicsData matrix (multivariate statistics)CombinatoricsStructure (category theory)

摘要: Space-economical estimation of the pth frequency moments, defined as , for p> 0, are interest in estimating all-pairs distances a large data matrix [14], machine learning, and stream computation. Random sketches formed by inner product vector f 1 ..., n with suitably chosen random were pioneered Alon, Matias Szegedy [1], have since played central role F p computations general. The concept p-stable whose components drawn from distribution, was proposed Indyk 0 < p< 2, has been further studied Li [13]. In this paper, we consider problem sketch [7] structure [5]. Our algorithms require space $\tilde{O}(\frac{1}{\epsilon^{2+p}})$ to estimate within ±i¾?factors requires expected time $O(\log F_1 \log \frac{1}{\delta})$ process each update. Thus, our technique trades an $O(\frac{1}{\epsilon^p})$ factor much more efficient processing updates. We also present stand-alone iterative estimator .

参考文章(25)
Moses Charikar, Kevin Chen, Martin Farach-Colton, Finding Frequent Items in Data Streams international colloquium on automata languages and programming. ,vol. 312, pp. 693- 703 ,(2002) , 10.1016/S0304-3975(03)00400-6
Piotr Indyk, David Woodruff, Optimal Approximations of the Frequency Moments ,(2004)
Matthew Hennessy, Robin Milner, On Observing Nondeterminism and Concurrency international colloquium on automata, languages and programming. pp. 299- 309 ,(1980) , 10.1007/3-540-10003-2_79
Sumit Ganguly, Deepanjan Kesh, Chandan Saha, Practical Algorithms for Tracking Database Join Sizes Lecture Notes in Computer Science. pp. 297- 309 ,(2005) , 10.1007/11590156_24
Lakshminath Bhuvanagiri, Sumit Ganguly, Estimating Entropy over Data Streams Lecture Notes in Computer Science. pp. 148- 159 ,(2006) , 10.1007/11841036_16
Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, Luca Trevisan, Counting Distinct Elements in a Data Stream randomization and approximation techniques in computer science. pp. 1- 10 ,(2002) , 10.1007/3-540-45726-7_1
N. Nisan, Pseudorandom generators for space-bounded computations symposium on the theory of computing. pp. 204- 212 ,(1990) , 10.1145/100216.100242
Mark N. Wegman, J.Lawrence Carter, New hash functions and their use in authentication and set equality Journal of Computer and System Sciences. ,vol. 22, pp. 265- 279 ,(1981) , 10.1016/0022-0000(81)90033-7
Philippe Flajolet, G. Nigel Martin, Probabilistic counting algorithms for data base applications Journal of Computer and System Sciences. ,vol. 31, pp. 182- 209 ,(1985) , 10.1016/0022-0000(85)90041-8