作者: Yuval Elovici , Robert Moskovitch
DOI:
关键词:
摘要: The present invention is directed to a method for detecting unknown malicious code, such as virus, worm, Trojan Horse or any combination thereof. Accordingly, Data Set created, which collection of files that includes first subset with code and second benign are identified by an antivirus program. All parsed using n-gram moving windows several lengths the TF representation computed each in file. An initial set top features (e.g., up 5500) all n-grams IS selected, based on DF measure number reduced comply computation resources required classifier training, selection methods. optimal then determined evaluation detection accuracy sets different data distributions prepared, number, will be used training test sets. For classifier, iteratively evaluated combinations distributions, while iteration, specific distribution testing trained distributions. results highest selected classifier.