摘要: We propose an approach for approximating the Jaccard similarity of two streams, J ( A , B ) = | ? domains where this is known to be high. Our method based on a reduction from F 2 norm estimation, which there exists sketch that efficient in terms both size and compute time, we augment by sampling technique. offers improvement fingerprint quadratic degree between streams. More precisely, approximate up multiplicative factor with confidence ?, it suffices take O ln 1 - t log minimal Further, computing our can done time per element stream.