VAMO

作者: Roberto Perdisci , ManChon U

DOI: 10.1145/2420950.2420999

关键词: Majority ruleCluster analysisQuality (business)Set (abstract data type)MalwareUnsupervised learningGround truthComputer scienceThe InternetData mining

摘要: Malware clustering is commonly applied by malware analysts to cope with the increasingly growing number of distinct variants collected every day from Internet. While systems can be useful for a variety applications, assessing quality their results intrinsically hard. In fact, viewed as an unsupervised learning process over dataset which complete ground truth usually not available. Previous studies propose evaluate leveraging labels assigned samples multiple anti-virus scanners (AVs). However, methods proposed thus far require (semi-)manual adjustment and mapping between generated different AVs, are limited selecting reference sub-set agreement regarding reached across majority AVs. This approach may bias set towards "easy cluster" samples, potentially resulting in overoptimistic estimate accuracy results.In this paper we VAMO, system that provides fully automated quantitative analysis validity results. Unlike previous work, VAMO does seek voting-based consensus AV labels, discard such cannot reached. Rather, explicitly deals inconsistencies typical build more representative set, compared approaches. Furthermore, avoids need was required work. Through extensive evaluation controlled setting real-world application, show outperforms approaches, better way automatically assess

参考文章(18)
Federico Maggi, Andrea Bellini, Guido Salvaneschi, Stefano Zanero, Finding non-trivial malware naming inconsistencies international conference on information systems security. pp. 144- 159 ,(2011) , 10.1007/978-3-642-25560-1_10
Fanglu Guo, Peter Ferrie, Tzi-cker Chiueh, A Study of the Packer Problem and Its Solutions recent advances in intrusion detection. pp. 98- 115 ,(2008) , 10.1007/978-3-540-87403-4_6
Peng Li, Limin Liu, Debin Gao, Michael K. Reiter, On challenges in evaluating malware clustering recent advances in intrusion detection. ,vol. 6307, pp. 238- 255 ,(2010) , 10.1007/978-3-642-15512-3_13
Roberto Perdisci, Nick Feamster, Wenke Lee, Behavioral clustering of HTTP-based malware and signature generation using malicious network traces networked systems design and implementation. pp. 26- 26 ,(2010) , 10.5555/1855711.1855737
Konrad Rieck, Philipp Trinius, Carsten Willems, Thorsten Holz, Automatic analysis of malware behavior using machine learning Journal of Computer Security. ,vol. 19, pp. 639- 668 ,(2011) , 10.3233/JCS-2010-0410
Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, Engin Kirda, Scalable, behavior-based malware clustering network and distributed system security symposium. ,(2009)
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review ACM Computing Surveys. ,vol. 31, pp. 264- 323 ,(1999) , 10.1145/331499.331504
Jiyong Jang, David Brumley, Shobha Venkataraman, BitShred Proceedings of the 18th ACM conference on Computer and communications security - CCS '11. pp. 309- 320 ,(2011) , 10.1145/2046707.2046742
E. B. Fowlkes, C. L. Mallows, A Method for Comparing Two Hierarchical Clusterings Journal of the American Statistical Association. ,vol. 78, pp. 553- 569 ,(1983) , 10.1080/01621459.1983.10478008