作者: Alexandr Andoni , Assaf Goldberger , Andrew McGregor , Ely Porat
关键词:
摘要: Fingerprinting is a widely-used technique for efficiently verifying that two files are identical. More generally, linear sketching form of lossy compression (based on random projections) also enables the "dissimilarity" non-identical to be estimated. Many sketches have been proposed dissimilarity measures decompose coordinate-wise such as Hamming distance between alphanumeric strings, or Euclidean vectors. However, virtually nothing known would accommodate alignment errors. With errors, distances rendered useless: small misalignment may result in file looks very dissimilar original according measures. In this paper, we present first sketch robust number Specifically, can used determine whether within being cyclic shift each other. Furthermore, homomorphic with respect rotations: it possible construct given only file. The relevant measure, distance, arises context embedding edit and our addressed an open problem [Question 13 Indyk-McGregor-Newman-Onak'11] rather surprising outcome. Our projects length $n$ into D(n) ⋅ polylog n dimensions where D(n)l divisors n. striking fact near-optimal, i.e., dependence inherent ostensibly about compression.In contrast, then show any estimating files, even when small, requires whose size nearly This lower bound addresses long-standing low distortion embeddings 2.15 Naor-Matousek'11, Indyk'01], case embeddings.