Towards interpreting ML-based automated malware detection models: a survey.

作者: Yuzhou Lin , Xiaolin Chang

DOI:

关键词:

摘要: Malware is being increasingly threatening and malware detectors based on traditional signature-based analysis are no longer suitable for current detection. Recently, the models machine learning (ML) developed predicting unknown variants saving human strength. However, most of existing ML black-box, which made their pre-diction results undependable, therefore need further interpretation in order to be effectively deployed wild. This paper aims examine categorize researches ML-based detector interpretability. We first give a detailed comparison over previous work common model inter-pretability groups after introducing principles, attributes, evaluation indi-cators taxonomy Then we investigate methods towards detection, by addressing importance interpreting detectors, challenges faced this field, solutions migitating these challenges, new classifying all state-of-the-art detection interpretability recent years. The highlight our survey providing interpreta-tion summarized re-searches field. In addition, evaluate approaches method attributes generate final score so as insight quantifying By concluding researches, hope can provide suggestions researchers who interested de-tection models.

参考文章(84)
Vincent Conitzer, Tuomas Sandholm, Computing shapley values, manipulating value division schemes, and checking core membership in multi-issue domains national conference on artificial intelligence. pp. 219- 225 ,(2004)
Jerome H. Friedman, Greedy function approximation: A gradient boosting machine. Annals of Statistics. ,vol. 29, pp. 1189- 1232 ,(2001) , 10.1214/AOS/1013203451
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, Wojciech Samek, On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE. ,vol. 10, pp. 0130140- ,(2015) , 10.1371/JOURNAL.PONE.0130140
R. Dennis Cook, Detection of influential observation in linear regression Technometrics. ,vol. 42, pp. 65- 68 ,(2000) , 10.2307/1271434
Fernando E.B. Otero, Alex A. Freitas, Improving the interpretability of classification rules discovered by an ant colony algorithm genetic and evolutionary computation conference. pp. 73- 80 ,(2013) , 10.1145/2463372.2463382
Rose Hatala, Geoffrey R. Norman, Lee R. Brooks, Influence of a Single Example on Subsequent Electrocardiogram Interpretation Teaching and Learning in Medicine. ,vol. 11, pp. 110- 117 ,(1999) , 10.1207/S15328015TL110210
Alex A. Freitas, Comprehensible classification models: a position paper Sigkdd Explorations. ,vol. 15, pp. 1- 10 ,(2014) , 10.1145/2594473.2594475
Yoel Tenne, S. W. Armfield, A framework for memetic optimization using variable global and local surrogate models soft computing. ,vol. 13, pp. 781- 793 ,(2009) , 10.1007/S00500-008-0348-2
X. Tan, B. Bhanu, Y. Lin, Fingerprint classification based on learned features systems man and cybernetics. ,vol. 35, pp. 287- 300 ,(2005) , 10.1109/TSMCC.2005.848167
Jerome H. Friedman, Bogdan E. Popescu, Predictive learning via rule ensembles The Annals of Applied Statistics. ,vol. 2, pp. 916- 954 ,(2008) , 10.1214/07-AOAS148