作者: M. D. McIlroy , J. W. Hunt
DOI:
关键词: Imaginary line 、 Longest increasing subsequence 、 Binary search algorithm 、 File comparison 、 Computer science 、 Algorithm 、 Hash function 、 Append 、 Line (text file) 、 Longest common subsequence problem
摘要: The program diff reports differences between two files, expressed as a minimal list of line changes to bring either file into agreement with the other. Diff has been engineered make efficient use time and space on typical inputs that arise in vetting version-to-version computer-maintained or computer-generated documents. Time usage are observed vary about sum lengths real data, although they known product worst case. central algorithm solves ‘longest common subsequence problem’ find lines do not change files. Practical efficiency is gained by attending only certain critical ‘candidate’ matches breaking which would shorten longest some pair initial segments Various techniques hashing, presorting equivalence classes, merging binary search, dynamic storage allocation used obtain good performance. [This document was scanned from Bell Laboratories Computing Science Technical Report #41, dated July 1976. Te xt converted OCR hand-corrected (last changed June, 2012). Figures were reconstructed. Some errors may remain, especially tables equations. Please report them doug@cs.dartmouth.edu.] creates what one have be it second vice versa. It based ideas several sources[1,2,7,8]. As an example its work, consider listed horizontally for brevity: b c d e f g w x y z easy see first can made following prescription, imaginary 0 understood at beginning each: append after 0: w, 3 through 4, were: into: z, delete 6 7, g. Going other way, this way: 1, was: 4 6, d, 7: Delete, operations available . indicates 1-letter