SAFE: Self-Attentive Function Embeddings for Binary Similarity

作者: Fabio Petroni , Leonardo Querzoni , Giuseppe Antonio Di Luna , Roberto Baldoni , Luca Massarelli

DOI:

关键词: Theoretical computer scienceBinary functionFeature extractionGraphBinary numberBinary codeArtificial neural networkComputer scienceEmbeddingFunction (mathematics)Similarity (geometry)

摘要: The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for recently gained momentum as they can be applied several fields, such copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare first transforming code multi-dimensional vector representations (embeddings), then comparing vectors through simple efficient geometric operations. However, embeddings usually derived from using manual feature extraction, that may fail important function characteristics, or consider features not the problem. In this paper we propose SAFE, a novel architecture embedding of based on self-attentive neural network. SAFE works directly disassembled functions, does require is computationally more than existing (i.e., it incur computational overhead building manipulating control flow graphs), general stripped binaries multiple architectures. We report results quantitative qualitative analysis show how provides noticeable performance improvement with respect to previous solutions. Furthermore, clusters our closely related semantic implemented algorithms, paving way further interesting applications (e.g. semantic-based search).

参考文章(28)
Manuel Egele, Peter Chapman, Maverick Woo, David Brumley, Blanket execution: dynamic similarity testing for program binaries and components usenix security symposium. pp. 303- 317 ,(2014)
Eui Chul Richard Shin, Dawn Song, Reza Moazzezi, None, Recognizing functions in binaries with neural networks usenix security symposium. pp. 611- 626 ,(2015)
Christian Rossow, Thorsten Holz, Jannik Pewny, Behrad Garmany, Robert Gawlik, Cross-Architecture Bug Search in Binary Executables 2015 IEEE Symposium on Security and Privacy. pp. 709- 724 ,(2015) , 10.1109/SP.2015.49
Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, John T. Riedl, Evaluating collaborative filtering recommender systems ACM Transactions on Information Systems. ,vol. 22, pp. 5- 53 ,(2004) , 10.1145/963770.963772
Jannik Pewny, Felix Schuster, Lukas Bernhard, Thorsten Holz, Christian Rossow, Leveraging semantic signatures for bug search in binary programs annual computer security applications conference. pp. 406- 415 ,(2014) , 10.1145/2664243.2664269
Azzah Al-Maskari, Mark Sanderson, Paul Clough, The relationship between IR effectiveness measures and user satisfaction international acm sigir conference on research and development in information retrieval. pp. 773- 774 ,(2007) , 10.1145/1277741.1277902
Wei Ming Khoo, Alan Mycroft, Ross Anderson, Rendezvous: A search engine for binary code mining software repositories. pp. 329- 338 ,(2013) , 10.1109/MSR.2013.6624046
Saed Alrabaee, Paria Shirani, Lingyu Wang, Mourad Debbabi, SIGMA: A Semantic Integrated Graph Matching Approach for identifying reused functions in binary code Digital Investigation. ,vol. 12, ,(2015) , 10.1016/J.DIIN.2015.01.011
Yaniv David, Eran Yahav, Tracelet-based code search in executables programming language design and implementation. ,vol. 49, pp. 349- 360 ,(2014) , 10.1145/2594291.2594343