作者: Kevin Allix , Yves Le Traon , Tegawendé François D Assise Bissyande , Jacques Klein
DOI:
关键词:
摘要: Machine Learning-based malware detection is a promising scalable method for identifying suspicious applications. In particular, in today’s mobile computing realm where thousands of applications are daily poured into markets, such technique could be valuable to guarantee strong filtering malicious apps. The success machine-learning approaches however highly dependent on (1) the quality datasets that used training and (2) appropriateness tested with regards built classifiers. Unfortunately, there scarce mention these aspects evaluation existing state-of-the-art literature. this paper, we consider relevance history construction datasets, highlight its impact performance scheme. Typically, show simply picking random set known train detector, as it done most assessment scenarios from literature, yields significantly biased results. process assessing extent through various experiments, were also able confirm number intuitive assumptions about Android malware. For instance, discuss existence lineages how they wild.