Method and Computer Program Product for Finding the Longest Common Subsequences Between Files with Applications to Differential Compression

作者: Ramesh Agarwal

DOI:

关键词: AlgorithmGreedy algorithmBlock sizeFile sizeLongest common subsequence problemData compressionSuffix arrayHash functionComputer scienceOffset (computer science)

摘要: A differential compression method and computer program product combines hash value techniques suffix array techniques. The invention finds the best matches for every offset of version file, with respect to a certain granularity above length threshold. has two variations depending on block size choice. If is kept fixed, performance similar that greedy algorithm, without expensive space time requirements. varied linearly reference file size, can run in linear-time constant-space. It been shown empirically performs better than known algorithms terms speed.