作者: Duen Horng Chau , Carey Nachenberg , Christos Faloutsos , Adam Wright , Jeffrey Wilhelm
DOI:
关键词:
摘要: We present Polonium, a scalable and e↵ective technology for detecting malware. evaluated it with the largest anonymized file submissions dataset ever published, which spans over 60 terabytes of disk space. formulated problem malware as large-scale graph mining inference task, we construct huge bipartite almost 1 billion nodes from our data, 48 million are users, 903 files. Edges, each denoting appearing on machine, exceeds 37 billion. Our method identifying is to locate files low reputation. Polonium algorithm computes reputation based fast Belief Propagation (O(|E|)), iteratively improves quality. With one iteration, attained 85% true positive rate (in malware). more iterations, further an additional 2%, significant improvement given baseline performance already very good. detail important design implementation features enable its successful application dataset. also empirical observations characteristics patterns in large billion-node graph.