Detecting Malware with Information Complexity

作者: George Danezis , Nadia Alshahwan , Earl T. Barr , David Clark

DOI:

关键词:

摘要: This work focuses on a specific front of the malware detection arms-race, namely persistent, disk-resident malware. We exploit normalised compression distance (NCD), an information theoretic measure, applied directly to binaries. Given zoo labelled and benign-ware, we ask whether suspect program is more similar our or benign-ware. Our approach classifies with 97.1% accuracy false positive rate 3%. achieve results off-the-shelf compressors standard machine learning classifier without any specialised knowledge. An end-user need only collect benign-ware then can immediately apply techniques. We statistical rigour experiments selection data. demonstrate that be optimised by combining NCD compressibility rates executables. reported within narrow time frame few days homogenous than over longer one two years but method still latter 95.2% 5% rate. Due use compression, computation cost non-trivial. show simple approximation techniques improve complexity up 63%. compare applying 59 anti-malware programs used VirusTotal web site does better single them as well collectively.

参考文章(9)
Olatz Arbelaitz, Iñigo Perona, Ibai Gurrutxaga, José I. Martín, Javier Muguerza, Jesús Ma Pérez, Evaluation of malware clustering based on its dynamic behaviour australasian data mining conference. pp. 163- 170 ,(2008)
Manuel Alfonseca, Manuel Cebrián, Alfonso Ortega, Common Pitfalls Using the Normalized Compression Distance: What to Watch Out for in a Compressor Communications in Information and Systems. ,vol. 5, pp. 367- 384 ,(2005) , 10.4310/CIS.2005.V5.N4.A1
R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression international symposium on information theory. ,vol. 51, pp. 1523- 1545 ,(2003) , 10.1109/TIT.2005.844059
Tao Gong, Xiaobin Tan, Ming Zhu, Malware Detection via Classifying with Compression international conference on information science and engineering. pp. 1765- 1768 ,(2009) , 10.1109/ICISE.2009.726
Acar Tamersoy, Kevin Roundy, Duen Horng Chau, None, Guilt by association: large scale malware detection by mining file-relation graphs knowledge discovery and data mining. pp. 1524- 1533 ,(2014) , 10.1145/2623330.2623342
Fahim H. Abbasi, R. J. Harris, Intrusion detection in Honeynets by compression and hashing australasian telecommunication networks and applications conference. pp. 96- 101 ,(2010) , 10.1109/ATNAC.2010.5680264
Leo Breiman, Random Forests Machine Learning archive. ,vol. 45, pp. 5- 32 ,(2001) , 10.1023/A:1010933404324