作者: Dinghao Wu , Lingwei Chen , Zhenzhou Tian , Yanping Chen , Borun Xie
DOI: 10.1109/ACCESS.2021.3069227
关键词: Artificial neural network 、 Malware analysis 、 Programming language 、 Source code 、 Abstraction (linguistics) 、 Identification (information) 、 Recurrent neural network 、 Binary code 、 Compiler 、 Computer science
摘要: Different compilers and optimization levels can be used to compile the source code. Revealed in reverse from produced binaries, these compiler details facilitate essential binary analysis tasks, such as malware software forensics. Most existing approaches adopt a signature matching based or machine learning strategy identify details, showing limits either detection accuracy granularity. In this work, we propose NeuralCI (Neural modeling-based Compiler Identification) infer including family, level version on individual functions. The basic idea is formulate sequence-oriented neural networks process normalized instruction sequences generated using lightweight function abstraction strategy. To evaluate performance of NeuralCI, large dataset consisting 854,858 unique functions collected 19 widely real-world projects constructed. experiments show that achieves averagely 98.6% identifying 95.3% level, 88.7% version, 94.8% family 83.0% all components simultaneously, outperforming identification methods terms both comprehensiveness.