作者: Fabio Petroni , Leonardo Querzoni , Giuseppe Antonio Di Luna , Roberto Baldoni , Luca Massarelli
DOI:
关键词: Theoretical computer science 、 Binary function 、 Feature extraction 、 Graph 、 Binary number 、 Binary code 、 Artificial neural network 、 Computer science 、 Embedding 、 Function (mathematics) 、 Similarity (geometry)
摘要: The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for recently gained momentum as they can be applied several fields, such copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare first transforming code multi-dimensional vector representations (embeddings), then comparing vectors through simple efficient geometric operations. However, embeddings usually derived from using manual feature extraction, that may fail important function characteristics, or consider features not the problem. In this paper we propose SAFE, a novel architecture embedding of based on self-attentive neural network. SAFE works directly disassembled functions, does require is computationally more than existing (i.e., it incur computational overhead building manipulating control flow graphs), general stripped binaries multiple architectures. We report results quantitative qualitative analysis show how provides noticeable performance improvement with respect to previous solutions. Furthermore, clusters our closely related semantic implemented algorithms, paving way further interesting applications (e.g. semantic-based search).