作者: Irfan Ul Haq , Sergio Chica , Juan Caballero , Somesh Jha
DOI: 10.1016/J.COSE.2018.07.012
关键词: Malware analysis 、 Hash function 、 Theoretical computer science 、 Computer science 、 Set (abstract data type) 、 Source code 、 Malware 、 Executable 、 Lineage (genetic)
摘要: Abstract Malware lineage studies the evolutionary relationships among malware and has important applications for analysis. A persistent limitation of prior approaches is to consider every input sample a separate version. This problematic since majority are packed packing process produces many polymorphic variants (i.e., executables with different file hash) same Thus, samples correspond version it challenging identify distinct versions from variants. problem does not manifest in because they work on synthetic malware, that packed, or which unpackers available. In this work, we propose novel approach works collected wild. Given set family, no source code available may be our graph where nodes family edges describe between versions. To enable approach, first technique scalable indexing determining shared functions any pair samples. We have evaluated accuracy 13 open-source programs applied produce graphs 10 popular families. Our achieve average 26 times reduction number