作者: Jiyong Jang , Maverick Woo , David Brumley
DOI:
关键词:
摘要: Software lineage refers to the evolutionary relationship among a collection of software. The goal software inference is recover given set program binaries. can provide extremely useful information in many security scenarios such as malware triage and vulnerability tracking. In this paper, we systematically study by exploring four fundamental questions not addressed prior work. First, how do automatically infer from binaries? Second, measure quality algorithms? Third, are existing approaches binary similarity analysis for inferring reality, about an idealized setting? Fourth, what limitations that any algorithm must cope with? Towards these goals build ILINE, system automatic binaries, also IEVAL, scientific assessment quality. We evaluated ILINE on two types lineage-- straight line directed acyclic graph--with large-scale real-world programs: 1,777 goodware spanning over combined 110 years development history 114 with known collected DARPA Cyber Genome program. used IEVAL seven metrics assess diverse properties lineage. Our results reveal partial order mismatches graph arc edit distance often yield most meaningful comparisons our experiments. Even without assuming data sets, proved be effective inference--it achieves mean accuracy 84% 72% sets.