Fingerprints for highly similar streams

作者: Yoram Bachrach , Ely Porat

DOI: 10.1016/J.IC.2015.06.001

关键词:

摘要: We propose an approach for approximating the Jaccard similarity of two streams, J ( A , B ) = | ? domains where this is known to be high. Our method based on a reduction from F 2 norm estimation, which there exists sketch that efficient in terms both size and compute time, we augment by sampling technique. offers improvement fingerprint quadratic degree between streams. More precisely, approximate up multiplicative factor with confidence ?, it suffices take O ln 1 - t log minimal Further, computing our can done time per element stream.

参考文章(37)
Ely Porat, Ohad Lipsky, Improved Sketching of Hamming Distance with Error Correcting Combinatorial Pattern Matching. pp. 173- 182 ,(2007) , 10.1007/978-3-540-73437-6_19
Moses Charikar, Kevin Chen, Martin Farach-Colton, Finding Frequent Items in Data Streams international colloquium on automata languages and programming. ,vol. 312, pp. 693- 703 ,(2002) , 10.1016/S0304-3975(03)00400-6
Yoram Bachrach, Ralf Herbrich, Fingerprinting ratings for collaborative filtering: theoretical and empirical analysis string processing and information retrieval. pp. 25- 36 ,(2010) , 10.1007/978-3-642-16321-0_3
Guy Feigenblat, Ely Porat, Ariel Shiftan, Even Better Framework for min-wise Based Algorithms arXiv: Data Structures and Algorithms. ,(2011)
Martin Dietzfelbinger, Universal Hashing and k-Wise Independent Random Variables via Integer Arithmetic without Primes symposium on theoretical aspects of computer science. pp. 569- 580 ,(1996) , 10.1007/3-540-60922-9_46
Raphaël Clifford, Markus Jalsenius, Ely Porat, Benjamin Sach, Pattern Matching in Multiple Streams Combinatorial Pattern Matching. pp. 97- 109 ,(2012) , 10.1007/978-3-642-31265-6_8
Mayur Datar, S. Muthukrishnan, Estimating Rarity and Similarity over Data Stream Windows european symposium on algorithms. pp. 323- 334 ,(2002) , 10.1007/3-540-45749-6_31
Yoram Bachrach, Ely Porat, Fast Pseudo-Random Fingerprints arXiv: Data Structures and Algorithms. ,(2010)
Yoram Bachrach, Ralf Herbrich, Ely Porat, Sketching Algorithms for Approximating Rank Correlations in Collaborative Filtering Systems string processing and information retrieval. pp. 344- 352 ,(2009) , 10.1007/978-3-642-03784-9_34
Edith Cohen, Nick Duffield, Haim Kaplan, Carsten Lund, Mikkel Thorup, Sketching unaggregated data streams for subpopulation-size queries symposium on principles of database systems. pp. 253- 262 ,(2007) , 10.1145/1265530.1265566