Fine-Grained Compiler Identification With Sequence-Oriented Neural Modeling

作者: Dinghao Wu , Lingwei Chen , Zhenzhou Tian , Yanping Chen , Borun Xie

DOI: 10.1109/ACCESS.2021.3069227

关键词: Artificial neural networkMalware analysisProgramming languageSource codeAbstraction (linguistics)Identification (information)Recurrent neural networkBinary codeCompilerComputer science

摘要: Different compilers and optimization levels can be used to compile the source code. Revealed in reverse from produced binaries, these compiler details facilitate essential binary analysis tasks, such as malware software forensics. Most existing approaches adopt a signature matching based or machine learning strategy identify details, showing limits either detection accuracy granularity. In this work, we propose NeuralCI (Neural modeling-based Compiler Identification) infer including family, level version on individual functions. The basic idea is formulate sequence-oriented neural networks process normalized instruction sequences generated using lightweight function abstraction strategy. To evaluate performance of NeuralCI, large dataset consisting 854,858 unique functions collected 19 widely real-world projects constructed. experiments show that achieves averagely 98.6% identifying 95.3% level, 88.7% version, 94.8% family 83.0% all components simultaneously, outperforming identification methods terms both comprehensiveness.

参考文章(37)
Wenke Lee, Monirul I. Sharif, Andrea Lanzi, Jonathon T. Giffin, Impeding Malware Analysis Using Conditional Code Obfuscation network and distributed system security symposium. pp. 1- 13 ,(2008)
Eui Chul Richard Shin, Dawn Song, Reza Moazzezi, None, Recognizing functions in binaries with neural networks usenix security symposium. pp. 611- 626 ,(2015)
Babak Yadegari, Brian Johannesmeyer, Ben Whitely, Saumya Debray, A Generic Approach to Automatic Deobfuscation of Executable Code 2015 IEEE Symposium on Security and Privacy. pp. 674- 691 ,(2015) , 10.1109/SP.2015.47
Ashkan Rahimian, Paria Shirani, Saed Alrbaee, Lingyu Wang, Mourad Debbabi, BinComp: A stratified approach to compiler provenance Attribution * Digital Investigation. ,vol. 14, ,(2015) , 10.1016/J.DIIN.2015.05.015
Xabier Ugarte-Pedrero, Davide Balzarotti, Igor Santos, Pablo G Bringas, None, SoK: Deep Packer Inspection: A Longitudinal Study of the Complexity of Run-Time Packers 2015 IEEE Symposium on Security and Privacy. pp. 659- 673 ,(2015) , 10.1109/SP.2015.46
Çaglar Gülçehre, Yoshua Bengio, Yoshua Bengio, Yoshua Bengio, KyungHyun Cho, Junyoung Chung, Empirical evaluation of gated recurrent neural networks on sequence modeling arXiv: Neural and Evolutionary Computing. ,(2014)
Marco Gaudesi, Andrea Marcelli, Ernesto Sanchez, Giovanni Squillero, Alberto Tonda, Malware Obfuscation through Evolutionary Packers genetic and evolutionary computation conference. pp. 757- 758 ,(2015) , 10.1145/2739482.2764940
Kevin A Roundy, Barton P Miller, None, Binary-code obfuscations in prevalent packer tools ACM Computing Surveys. ,vol. 46, pp. 1- 32 ,(2013) , 10.1145/2522968.2522972
Annie H. Toderici, Mark Stamp , Chi-squared distance and metamorphic virus detection Journal of Computer Virology and Hacking Techniques. ,vol. 9, pp. 1- 14 ,(2013) , 10.1007/S11416-012-0171-2
Nathan Rosenblum, Barton P. Miller, Xiaojin Zhu, Recovering the toolchain provenance of binary code Proceedings of the 2011 International Symposium on Software Testing and Analysis - ISSTA '11. pp. 100- 110 ,(2011) , 10.1145/2001420.2001433