A Better Understanding of Machine Learning Malware Misclassifcation

作者: Nada Alruhaily , Tom Chothia , Behzad Bordbar

DOI: 10.1007/978-3-319-93354-2_3

关键词: Behavioural analysisMalwareArtificial intelligenceComputer scienceDetection rateMachine learning

摘要: Machine learning-based malware detection systems have been widely suggested and used as a replacement for signature-based methods. Such shown that they can provide high rate when recognising non-previously seen samples. However, classifying based on their behavioural features, some new go undetected, resulting in misclassification. Our aim is to gain more understanding of the underlying causes misclassification; this will help develop robust systems. Towards objective, several questions addressed paper: Does misclassification increase over period time? Do changes affect classification occur at level families, where all instances belong certain families are hard detect? Alternatively, such be traced back variants instead families? Also, does removing distinct API functions only by malware? As technique could writers evade detection. experiments showed behaviour mostly due across did not behave expected. It also machine maintain even case trying using functions, which uniquely malware.

参考文章(48)
Ralf Klinkenberg, Lehrstuhl Informatik Viii, Daimler-Benz Ag, Ingrid Renz, Adaptive Information Filtering: Learning in the Presence of Concept Drifts ,(1998)
Bee Wah Yap, Khatijahhusna Abd Rani, Hezlin Aryani Abd Rahman, Simon Fong, Zuraida Khairudin, Nik Nik Abdullah, An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets 1st International Conference on Advanced Data and Information Engineering, DaEng 2013. pp. 13- 22 ,(2014) , 10.1007/978-981-4585-18-7_2
Shu-Chang Din, Bai-Jian Gao, Yi-Bin Lu, Chao-Fu Zheng, Using Multi-Feature and Classifier Ensembles to Improve Malware Detection ,(2010)
Qiguang Miao, Jiachen Liu, Ying Cao, Jianfeng Song, Malware detection using bilayer behavior abstraction and improved one-class support vector machines International Journal of Information Security. ,vol. 15, pp. 361- 379 ,(2016) , 10.1007/S10207-015-0297-6
Paul Watters, Robert Layton, Sitalakshmi Venkataraman, Manoun Alazab, Malware Detection Based on Structural and Behavioural Features of API Calls cyber resilience conference. pp. 1- 10 ,(2010)
Andrew Walenstein, Arun Lakhotia, The Software Similarity Problem in Malware Analysis dagstuhl seminar proceedings. pp. 0- ,(2007)
J.-Y. Xu, A.H. Sung, P. Chavez, S. Mukkamala, Polymorphic malicious executable scanner by API sequence analysis international conference hybrid intelligent systems. pp. 378- 383 ,(2004) , 10.1109/ICHIS.2004.75
E.Y. Chang, Beitao Li, Gang Wu, Kingshy Goh, Statistical learning for effective visual information retrieval international conference on image processing. ,vol. 3, pp. 609- 612 ,(2003) , 10.1109/ICIP.2003.1247318
Chun-I Fan, Han-Wei Hsiao, Chun-Han Chou, Yi-Fan Tseng, None, Malware Detection Systems Based on API Log Data Mining computer software and applications conference. ,vol. 3, pp. 255- 260 ,(2015) , 10.1109/COMPSAC.2015.241
Sotiris B. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques Informatica (lithuanian Academy of Sciences). ,vol. 31, pp. 249- 268 ,(2007)