Homomorphic fingerprints under misalignments

作者: Alexandr Andoni , Assaf Goldberger , Andrew McGregor , Ely Porat

DOI: 10.1145/2488608.2488726

关键词:

摘要: Fingerprinting is a widely-used technique for efficiently verifying that two files are identical. More generally, linear sketching form of lossy compression (based on random projections) also enables the "dissimilarity" non-identical to be estimated. Many sketches have been proposed dissimilarity measures decompose coordinate-wise such as Hamming distance between alphanumeric strings, or Euclidean vectors. However, virtually nothing known would accommodate alignment errors. With errors, distances rendered useless: small misalignment may result in file looks very dissimilar original according measures. In this paper, we present first sketch robust number Specifically, can used determine whether within being cyclic shift each other. Furthermore, homomorphic with respect rotations: it possible construct given only file. The relevant measure, distance, arises context embedding edit and our addressed an open problem [Question 13 Indyk-McGregor-Newman-Onak'11] rather surprising outcome. Our projects length $n$ into D(n) ⋅ polylog n dimensions where D(n)l divisors n. striking fact near-optimal, i.e., dependence inherent ostensibly about compression.In contrast, then show any estimating files, even when small, requires whose size nearly This lower bound addresses long-standing low distortion embeddings 2.15 Naor-Matousek'11, Indyk'01], case embeddings.

参考文章(46)
Hossein Jowhari, Efficient Communication Protocols for Deciding Edit Distance Algorithms – ESA 2012. pp. 648- 658 ,(2012) , 10.1007/978-3-642-33090-2_56
Ely Porat, Ohad Lipsky, Improved Sketching of Hamming Distance with Error Correcting Combinatorial Pattern Matching. pp. 173- 182 ,(2007) , 10.1007/978-3-540-73437-6_19
Moses Charikar, Kevin Chen, Martin Farach-Colton, Finding Frequent Items in Data Streams international colloquium on automata languages and programming. ,vol. 312, pp. 693- 703 ,(2002) , 10.1016/S0304-3975(03)00400-6
Yoram Bachrach, Ralf Herbrich, Fingerprinting ratings for collaborative filtering: theoretical and empirical analysis string processing and information retrieval. pp. 25- 36 ,(2010) , 10.1007/978-3-642-16321-0_3
P. Indyk, Algorithmic applications of low-distortion geometric embeddings international conference on cluster computing. pp. 10- 33 ,(2001) , 10.1109/SFCS.2001.959878
Raphaël Clifford, Klim Efremenko, Ely Porat, Amir Rothschild, k-Mismatch with Don’t Cares Algorithms – ESA 2007. pp. 151- 162 ,(2007) , 10.1007/978-3-540-75520-3_15
Andrew McGregor, Kook Jin Ahn, Sudipto Guha, Analyzing graph structure via linear measurements symposium on discrete algorithms. pp. 459- 467 ,(2012) , 10.5555/2095116.2095156
Ely Porat, Amir Rothschild, Klim Efremenko, Raphaël Clifford, From coding theory to efficient pattern matching symposium on discrete algorithms. pp. 778- 784 ,(2009) , 10.5555/1496770.1496855
Michael S. Crouch, Andrew McGregor, Periodicity and Cyclic Shifts via Linear Sketches Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. pp. 158- 170 ,(2011) , 10.1007/978-3-642-22935-0_14
Ely Porat, Amir Rothschild, Explicit Non-adaptive Combinatorial Group Testing Schemes international colloquium on automata languages and programming. pp. 748- 759 ,(2008) , 10.1007/978-3-540-70575-8_61