MutantX-S: scalable malware clustering based on static features

作者: Sandeep Bhatkar , Kang G. Shin , Kent Griffin , Xin Hu

DOI:

关键词:

摘要: The current lack of automatic and speedy labeling a large number (thousands) malware samples seen everyday delays the generation signatures has become major challenge for anti-virus industries. In this paper, we design, implement evaluate novel, scalable framework, called MutantX-S, that can efficiently cluster into families based on programs' static features, i.e., code instruction sequences. MutantX-S is unique combination several novel techniques to address practical challenges clustering. Specifically, it exploits format ×86 architecture represents program as sequence opcodes, facilitating extraction N-gram features. It also hashing trick recently developed in machine learning community reduce dimensionality extracted feature vectors, thus significantly lowering memory requirement computation costs. Our comprehensive evaluation prototype using database more than 130,000 shown its ability correctly over 80% within 2 hours, achieving good balance between accuracy scalability. Applying created at different times, demonstrate achieves high predicting labels previously unknown malware.

参考文章(22)
Georg Wicherski, peHash: a novel approach to fast malware clustering usenix conference on large scale exploits and emergent threats. pp. 1- 1 ,(2009)
Fanglu Guo, Peter Ferrie, Tzi-cker Chiueh, A Study of the Packer Problem and Its Solutions recent advances in intrusion detection. pp. 98- 115 ,(2008) , 10.1007/978-3-540-87403-4_6
Nello Cristianini, John Shawe-Taylor, Kernel Methods for Pattern Analysis ,(2004)
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Mihai Christodorescu, Somesh Jha, Static analysis of executables to detect malicious patterns usenix security symposium. pp. 12- 12 ,(2003) , 10.21236/ADA449067
James Franklin, The elements of statistical learning : data mining, inference,and prediction The Mathematical Intelligencer. ,vol. 27, pp. 83- 85 ,(2005) , 10.1007/BF02985802
Fredrik Valeur, Christopher Kruegel, Giovanni Vigna, William Robertson, Static disassembly of obfuscated binaries usenix security symposium. pp. 18- 18 ,(2004)
Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick Düssel, Pavel Laskov, Learning and Classification of Malware Behavior international conference on detection of intrusions and malware and vulnerability assessment. pp. 108- 125 ,(2008) , 10.1007/978-3-540-70542-0_6
Konrad Rieck, Philipp Trinius, Carsten Willems, Thorsten Holz, Automatic analysis of malware behavior using machine learning Journal of Computer Security. ,vol. 19, pp. 639- 668 ,(2011) , 10.3233/JCS-2010-0410
Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, Engin Kirda, Scalable, behavior-based malware clustering network and distributed system security symposium. ,(2009)