Sampling from Dense Streams without Penalty

作者: Vladimir Braverman , Gregory Vorsanger

DOI: 10.1007/978-3-319-08783-2_2

关键词:

摘要: We investigate the ability to sample relatively small amounts of data from a stream and approximately calculate statistics on original stream. McGregor et al. [29] provide worst case theoretical bounds that show space costs for sampling are inversely correlated with rate. Indeed, while lower bound cannot be improved in general case, we it is possible improve D domain n, when average positive frequency μ = F 1/F 0 sufficiently large. consider following range parameters: ≥ log(n) rate p C k − 1log(n), where constant. On these streams \(\tilde{O} ({1 \over p} n^{1-2/k})\) \( \tilde{O} (n^{1-2/k})\) thus giving polynomial improvement large 1.

参考文章(36)
Christos H. Papadimitriou, Ziv Bar-Yossef, The complexity of massive data set computations University of California at Berkeley. ,(2002)
Guandong Xu, Yanchun Zhang, Lin Li, Algorithms and Techniques Springer US. pp. 29- 68 ,(2011) , 10.1007/978-1-4419-7735-9_3
Vladimir Braverman, Rafail Ostrovsky, Generalizing the Layering Method of Indyk and Woodruff: Recursive Sketches for Frequency-Based Vectors on Streams Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. pp. 58- 70 ,(2013) , 10.1007/978-3-642-40328-6_5
Matthew Hennessy, Robin Milner, On Observing Nondeterminism and Concurrency international colloquium on automata, languages and programming. pp. 299- 309 ,(1980) , 10.1007/3-540-10003-2_79
Sumit Ganguly, Graham Cormode, On Estimating Frequency Moments of Data Streams international workshop and international workshop on approximation randomization and combinatorial optimization algorithms and techniques. pp. 479- 493 ,(2007) , 10.1007/978-3-540-74208-1_35
Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, Luca Trevisan, Counting Distinct Elements in a Data Stream randomization and approximation techniques in computer science. pp. 1- 10 ,(2002) , 10.1007/3-540-45726-7_1
Vladimir Braverman, Rafail Ostrovsky, Dan Vilenchik, How Hard Is Counting Triangles in the Streaming Model? Automata, Languages, and Programming. pp. 244- 254 ,(2013) , 10.1007/978-3-642-39206-1_21
Vladimir Braverman, Rafail Ostrovsky, Approximating Large Frequency Moments with Pick-and-Drop Sampling Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. pp. 42- 57 ,(2013) , 10.1007/978-3-642-40328-6_4
Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, On random sampling over joins ACM SIGMOD Record. ,vol. 28, pp. 263- 274 ,(1999) , 10.1145/304181.304206
Don Coppersmith, Ravi Kumar, An improved data stream algorithm for frequency moments symposium on discrete algorithms. pp. 151- 156 ,(2004) , 10.5555/982792.982815