Malware classification using deep learning methods

作者: Bugra Cakir , Erdogan Dogdu

DOI: 10.1145/3190645.3190692

关键词: Task (computing)MalwareFeature extractionSupervised learningOpcodeDeep learningGradient boostingWord2vecArtificial intelligenceComputer scienceMachine learning

摘要: Malware, short for Malicious Software, is growing continuously in numbers and sophistication as our digital world continuous to grow. It a very serious problem many efforts are devoted malware detection today's cybersecurity world. Many machine learning algorithms used the automatic of recent years. Most recently, deep being with better performance. Deep models shown work much analysis long sequences system calls. In this paper shallow learning-based feature extraction method (word2vec) representing any given based on its opcodes. Gradient Boosting algorithm classification task. Then, k-fold cross-validation validate model performance without sacrificing validation split. Evaluation results show up 96% accuracy limited sample data.

参考文章(16)
A.H. Sung, J. Xu, P. Chavez, S. Mukkamala, Static analyzer of vicious executables (SAVE) annual computer security applications conference. pp. 326- 334 ,(2004) , 10.1109/CSAC.2004.37
Babak Yadegari, Brian Johannesmeyer, Ben Whitely, Saumya Debray, A Generic Approach to Automatic Deobfuscation of Executable Code 2015 IEEE Symposium on Security and Privacy. pp. 674- 691 ,(2015) , 10.1109/SP.2015.47
Mihai Christodorescu, Somesh Jha, Static analysis of executables to detect malicious patterns usenix security symposium. pp. 12- 12 ,(2003) , 10.21236/ADA449067
Razvan Pascanu, Jack W. Stokes, Hermineh Sanossian, Mady Marinescu, Anil Thomas, Malware classification with recurrent networks international conference on acoustics, speech, and signal processing. pp. 1916- 1920 ,(2015) , 10.1109/ICASSP.2015.7178304
Jerome H. Friedman, Greedy function approximation: A gradient boosting machine. Annals of Statistics. ,vol. 29, pp. 1189- 1232 ,(2001) , 10.1214/AOS/1013203451
Joshua Saxe, Konstantin Berlin, Deep neural network based malware detection using two dimensional binary program features international conference on malicious and unwanted software. pp. 11- 20 ,(2015) , 10.1109/MALWARE.2015.7413680
Mikhail Zolotukhin, Timo Hämäläinen, Detection of zero-day malware based on the analysis of opcode sequences consumer communications and networking conference. pp. 386- 391 ,(2014) , 10.1109/CCNC.2014.6866599
S. Momina Tabish, M. Zubair Shafiq, Muddassar Farooq, Malware detection using statistical analysis of byte-level file content knowledge discovery and data mining. pp. 23- 31 ,(2009) , 10.1145/1599272.1599278
Igor Santos, Felix Brezo, Xabier Ugarte-Pedrero, Pablo G Bringas, None, Opcode sequences as representation of executables for data-mining-based unknown malware detection Information Sciences. ,vol. 231, pp. 64- 82 ,(2013) , 10.1016/J.INS.2011.08.020
Ilya Sutskever, Tomas Mikolov, Greg S Corrado, Kai Chen, Jeff Dean, Distributed Representations of Words and Phrases and their Compositionality neural information processing systems. ,vol. 26, pp. 3111- 3119 ,(2013)