A Semi-supervised Learning Methodology for Malware Categorization using Weighted Word Embeddings

作者： Hugo Leonardo Duarte-Garcia , Carlos Domenick Morales-Medina , Aldo Hernandez-Suarez , Gabriel Sanchez-Perez , Karina Toscano-Medina

DOI: 10.1109/EUROSPW.2019.00033

关键词: Categorization 、 Feature extraction 、 Machine learning 、 Word2vec 、 Computer science 、 Artificial intelligence 、 Word embedding 、 Semi-supervised learning 、 Unsupervised learning 、 Cluster analysis 、 Malware

摘要: Due to the vertiginous growth of malicious actors, malware has been crafted, distributed and propagated around world with new sophisticated techniques. Classical detection procedures, mostly based on signatures heuristic searches, are now being replaced machine learning-based (ML) solutions. However, some challenges still present. Firstly, supervised approaches use anti-virus tags create hand-crafted datasets, resulting in a lack taxonomy uncertainty if given observation is classified proper label. Secondly, off-line feed-forward may result complex time consuming feature extraction tasks. In this work, we propose novel method that reinforces characterization by capturing rich relevance contextual patterns into an n-dimensional weighted word embedding vector (WEV) space. Results prove clustering similar WEVs via unsupervised learning, can be categorized four major families, improving less resources.

参考文章(23)

Joseph Lilleberg, Yun Zhu, Yanqing Zhang, Support vector machines and Word2vec for text classification with semantic features ieee international conference on cognitive informatics and cognitive computing. pp. 136- 140 ,(2015) , 10.1109/ICCI-CC.2015.7259377

Gaute Wangen, The Role of Malware in Reported Cyber Espionage: A Review of the Impact and Mechanism Information-an International Interdisciplinary Journal. ,vol. 6, pp. 183- 211 ,(2015) , 10.3390/INFO6020183

Omid E. David, Nathan S. Netanyahu, DeepSign: Deep learning for automatic malware signature generation and classification international joint conference on neural network. pp. 1- 8 ,(2015) , 10.1109/IJCNN.2015.7280815

Konrad Rieck, Philipp Trinius, Carsten Willems, Thorsten Holz, Automatic analysis of malware behavior using machine learning Journal of Computer Security. ,vol. 19, pp. 639- 668 ,(2011) , 10.3233/JCS-2010-0410

Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Brad Miller, Vaishaal Shankar, Rekha Bachwani, Anthony D. Joseph, J. D. Tygar, Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security. pp. 45- 56 ,(2015) , 10.1145/2808769.2808780

Mohamed Belaoued, Smaine Mazouzi, Statistical Study of Imported APIs by PE Type Malware 2014 International Conference on Advanced Networking Distributed Systems and Applications. pp. 82- 86 ,(2014) , 10.1109/INDS.2014.22

H.B. Kazemian, S. Ahmed, Comparisons of machine learning techniques for detecting malicious webpages Expert Systems With Applications. ,vol. 42, pp. 1166- 1177 ,(2015) , 10.1016/J.ESWA.2014.08.046

David A. Mundie, David M. Mcintire, An Ontology for Malware Analysis availability, reliability and security. pp. 556- 558 ,(2013) , 10.1109/ARES.2013.73

Ekta Gandotra, Divya Bansal, Sanjeev Sofat, Malware Analysis and Classification: A Survey Journal of Information Security. ,vol. 5, pp. 56- 64 ,(2014) , 10.4236/JIS.2014.52006

10.

Justin Sahs, Latifur Khan, A Machine Learning Approach to Android Malware Detection european intelligence and security informatics conference. pp. 141- 147 ,(2012) , 10.1109/EISIC.2012.34

A Semi-supervised Learning Methodology for Malware Categorization using Weighted Word Embeddings

来源期刊

我的账户

A Semi-supervised Learning Methodology for Malware Categorization using Weighted Word Embeddings

来源期刊

相似文章 1

Generating Adversarial Malware Examples with API Semantics-Awareness for Black-Box Attacks.

我的账户