A Semi-supervised Learning Methodology for Malware Categorization using Weighted Word Embeddings

作者: Hugo Leonardo Duarte-Garcia , Carlos Domenick Morales-Medina , Aldo Hernandez-Suarez , Gabriel Sanchez-Perez , Karina Toscano-Medina

DOI: 10.1109/EUROSPW.2019.00033

关键词: CategorizationFeature extractionMachine learningWord2vecComputer scienceArtificial intelligenceWord embeddingSemi-supervised learningUnsupervised learningCluster analysisMalware

摘要: Due to the vertiginous growth of malicious actors, malware has been crafted, distributed and propagated around world with new sophisticated techniques. Classical detection procedures, mostly based on signatures heuristic searches, are now being replaced machine learning-based (ML) solutions. However, some challenges still present. Firstly, supervised approaches use anti-virus tags create hand-crafted datasets, resulting in a lack taxonomy uncertainty if given observation is classified proper label. Secondly, off-line feed-forward may result complex time consuming feature extraction tasks. In this work, we propose novel method that reinforces characterization by capturing rich relevance contextual patterns into an n-dimensional weighted word embedding vector (WEV) space. Results prove clustering similar WEVs via unsupervised learning, can be categorized four major families, improving less resources.

参考文章(23)
Joseph Lilleberg, Yun Zhu, Yanqing Zhang, Support vector machines and Word2vec for text classification with semantic features ieee international conference on cognitive informatics and cognitive computing. pp. 136- 140 ,(2015) , 10.1109/ICCI-CC.2015.7259377
Gaute Wangen, The Role of Malware in Reported Cyber Espionage: A Review of the Impact and Mechanism Information-an International Interdisciplinary Journal. ,vol. 6, pp. 183- 211 ,(2015) , 10.3390/INFO6020183
Omid E. David, Nathan S. Netanyahu, DeepSign: Deep learning for automatic malware signature generation and classification international joint conference on neural network. pp. 1- 8 ,(2015) , 10.1109/IJCNN.2015.7280815
Konrad Rieck, Philipp Trinius, Carsten Willems, Thorsten Holz, Automatic analysis of malware behavior using machine learning Journal of Computer Security. ,vol. 19, pp. 639- 668 ,(2011) , 10.3233/JCS-2010-0410
Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Brad Miller, Vaishaal Shankar, Rekha Bachwani, Anthony D. Joseph, J. D. Tygar, Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security. pp. 45- 56 ,(2015) , 10.1145/2808769.2808780
Mohamed Belaoued, Smaine Mazouzi, Statistical Study of Imported APIs by PE Type Malware 2014 International Conference on Advanced Networking Distributed Systems and Applications. pp. 82- 86 ,(2014) , 10.1109/INDS.2014.22
H.B. Kazemian, S. Ahmed, Comparisons of machine learning techniques for detecting malicious webpages Expert Systems With Applications. ,vol. 42, pp. 1166- 1177 ,(2015) , 10.1016/J.ESWA.2014.08.046
David A. Mundie, David M. Mcintire, An Ontology for Malware Analysis availability, reliability and security. pp. 556- 558 ,(2013) , 10.1109/ARES.2013.73
Ekta Gandotra, Divya Bansal, Sanjeev Sofat, Malware Analysis and Classification: A Survey Journal of Information Security. ,vol. 5, pp. 56- 64 ,(2014) , 10.4236/JIS.2014.52006
Justin Sahs, Latifur Khan, A Machine Learning Approach to Android Malware Detection european intelligence and security informatics conference. pp. 141- 147 ,(2012) , 10.1109/EISIC.2012.34