Towards automatic software lineage inference

作者: Jiyong Jang , Maverick Woo , David Brumley

DOI:

关键词:

摘要: Software lineage refers to the evolutionary relationship among a collection of software. The goal software inference is recover given set program binaries. can provide extremely useful information in many security scenarios such as malware triage and vulnerability tracking. In this paper, we systematically study by exploring four fundamental questions not addressed prior work. First, how do automatically infer from binaries? Second, measure quality algorithms? Third, are existing approaches binary similarity analysis for inferring reality, about an idealized setting? Fourth, what limitations that any algorithm must cope with? Towards these goals build ILINE, system automatic binaries, also IEVAL, scientific assessment quality. We evaluated ILINE on two types lineage-- straight line directed acyclic graph--with large-scale real-world programs: 1,777 goodware spanning over combined 110 years development history 114 with known collected DARPA Cyber Genome program. used IEVAL seven metrics assess diverse properties lineage. Our results reveal partial order mismatches graph arc edit distance often yield most meaningful comparisons our experiments. Even without assuming data sets, proved be effective inference--it achieves mean accuracy 84% 72% sets.

参考文章(43)
Tudor Dumitras, Iulian Neamtiu, Experimental challenges in cyber security: a story of provenance and lineage for malware usenix security symposium. pp. 9- 9 ,(2011)
Fanglu Guo, Peter Ferrie, Tzi-cker Chiueh, A Study of the Packer Problem and Its Solutions recent advances in intrusion detection. pp. 98- 115 ,(2008) , 10.1007/978-3-540-87403-4_6
Fredrik Valeur, Christopher Kruegel, Giovanni Vigna, William Robertson, Static disassembly of obfuscated binaries usenix security symposium. pp. 18- 18 ,(2004)
Viet Hung Nguyen, Fabio Massacci, Stephan Neuhaus, After-life vulnerabilities: a study on firefox evolution, its vulnerabilities, and fixes international conference on engineering secure software and systems. pp. 195- 208 ,(2011) , 10.5555/1946341.1946361
Halvar Flake, Structural Comparison of Executable Objects DIMVA. pp. 161- 173 ,(2004) , 10.17877/DE290R-2007
Meir M. Lehman, Juan F. Ramil, Rules and Tools for Software Evolution Planning and Management Annals of Software Engineering. ,vol. 11, pp. 15- 44 ,(2001) , 10.1023/A:1012535017876
Konrad Rieck, Philipp Trinius, Carsten Willems, Thorsten Holz, Automatic analysis of malware behavior using machine learning Journal of Computer Security. ,vol. 19, pp. 639- 668 ,(2011) , 10.3233/JCS-2010-0410
Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, Engin Kirda, Scalable, behavior-based malware clustering network and distributed system security symposium. ,(2009)
Michael A. Bender, Martín Farach-Colton, Giridhar Pemmasani, Steven Skiena, Pavel Sumazin, Lowest common ancestors in trees and directed acyclic graphs Journal of Algorithms. ,vol. 57, pp. 75- 94 ,(2005) , 10.1016/J.JALGOR.2005.08.001
T.J. McCabe, A Complexity Measure IEEE Transactions on Software Engineering. ,vol. SE-2, pp. 308- 320 ,(1976) , 10.1109/TSE.1976.233837