V2W-BERT: A Framework for Effective Hierarchical Multiclass Classification of Software Vulnerabilities.

作者: Siddhartha Shankar Das , Edoardo Serra , Mahantesh Halappanavar , Alex Pothen , Ehab Al-Shaer

DOI:

关键词: Multiclass classificationComputer scienceNational Vulnerability DatabaseCommon Vulnerabilities and ExposuresSoftwareMachine learningVulnerabilityArtificial intelligenceProtocol (object-oriented programming)

摘要: Weaknesses in computer systems such as faults, bugs and errors the architecture, design or implementation of software provide vulnerabilities that can be exploited by attackers to compromise security a system. Common Weakness Enumerations (CWE) are hierarchically designed dictionary weaknesses means understand flaws, potential impact their exploitation, mitigate these flaws. Vulnerabilities Exposures (CVE) brief low-level descriptions uniquely identify specific product protocol. Classifying mapping CVEs CWEs provides vulnerabilities. Since manual is not viable option, automated approaches desirable but challenging. We present novel Transformer-based learning framework (V2W-BERT) this paper. By using ideas from natural language processing, link prediction transfer learning, our method outperforms previous only for CWE instances with abundant data train, also rare classes little no train. Our approach shows significant improvements historical predict links future CVEs, therefore, practical applications. Using MITRE National Vulnerability Database, we achieve up 97% accuracy randomly partitioned 94% temporally data. We believe work will influence better methods training models, well applications solve increasingly harder problems cybersecurity.

参考文章(20)
Stephan Neuhaus, Thomas Zimmermann, Security Trend Analysis with CVE Topic Models international symposium on software reliability engineering. pp. 111- 120 ,(2010) , 10.1109/ISSRE.2010.53
Robert A. Martin, Sean Barnum, Common weakness enumeration (CWE) status update ACM SIGAda Ada Letters. ,vol. XXVIII, pp. 88- 91 ,(2008) , 10.1145/1387830.1387835
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, Janyce Wiebe, SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability north american chapter of the association for computational linguistics. pp. 252- 263 ,(2015) , 10.18653/V1/S15-2045
Sarang Na, Taeeun Kim, Hwankuk Kim, A Study on the Classification of Common Vulnerabilities and Exposures using Naïve Bayes Advances on Broad-Band Wireless Computing, Communication and Applications. pp. 657- 662 ,(2017) , 10.1007/978-3-319-49106-6_65
Matthieu Jimenez, Mike Papadakis, Yves Le Traon, An Empirical Analysis of Vulnerabilities in OpenSSL and the Linux Kernel 2016 23rd Asia-Pacific Software Engineering Conference (APSEC). pp. 105- 112 ,(2016) , 10.1109/APSEC.2016.025
Daniela Soares Cruzes, Michael Felderer, Tosin Daniel Oyetoyan, Matthias Gander, Irdin Pekaric, How is Security Testing Done in Agile Teams? A Cross-Case Analysis of Four Software Teams Lecture Notes in Business Information Processing. pp. 201- 216 ,(2017) , 10.1007/978-3-319-57633-6_13
Zhuobing Han, Xiaohong Li, Zhenchang Xing, Hongtao Liu, Zhiyong Feng, Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description international conference on software maintenance. pp. 125- 136 ,(2017) , 10.1109/ICSME.2017.52
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Céspedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil, None, Universal Sentence Encoder arXiv: Computation and Language. ,(2018)
Kristina Toutanova, Ming-Wei Chang, Jacob Devlin, Kenton Lee, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding arXiv: Computation and Language. ,(2018)
Jason W. Wei, Kai Zou, EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks arXiv: Computation and Language. ,(2019)