Polonium: Tera-Scale Graph Mining for Malware Detection

作者: Duen Horng Chau , Carey Nachenberg , Christos Faloutsos , Adam Wright , Jeffrey Wilhelm

DOI:

关键词:

摘要: We present Polonium, a scalable and e↵ective technology for detecting malware. evaluated it with the largest anonymized file submissions dataset ever published, which spans over 60 terabytes of disk space. formulated problem malware as large-scale graph mining inference task, we construct huge bipartite almost 1 billion nodes from our data, 48 million are users, 903 files. Edges, each denoting appearing on machine, exceeds 37 billion. Our method identifying is to locate files low reputation. Polonium algorithm computes reputation based fast Belief Propagation (O(|E|)), iteratively improves quality. With one iteration, attained 85% true positive rate (in malware). more iterations, further an additional 2%, significant improvement given baseline performance already very good. detail important design implementation features enable its successful application dataset. also empirical observations characteristics patterns in large billion-node graph.

参考文章(19)
John Lafferty, Xiaojin Zhu, Ronald Rosenfeld, Semi-supervised learning with graphs Carnegie Mellon University. ,(2005)
Carlos Guestrin, Yucheng Low, Joseph Gonzalez, Residual Splash for Optimally Parallelizing Belief Propagation international conference on artificial intelligence and statistics. pp. 177- 184 ,(2009)
Zoltán Gyöngyi, Hector Garcia-Molina, Jan Pedersen, Combating web spam with trustrank very large data bases. pp. 576- 587 ,(2004) , 10.1016/B978-012088469-8.50052-8
Wei Wang, Chen Wang, Yongtai Zhu, Baile Shi, Jian Pei, Xifeng Yan, Jiawei Han, GraphMiner Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05. pp. 879- 881 ,(2005) , 10.1145/1066157.1066273
G.J. Tesauro, J.O. Kephart, G.B. Sorkin, Neural networks for computer virus recognition IEEE Intelligent Systems. ,vol. 11, pp. 5- 6 ,(1996) , 10.1109/64.511768
Nicholas Weaver, Vern Paxson, Stuart Staniford, Robert Cunningham, A taxonomy of computer worms workshop on rapid malcode. pp. 11- 18 ,(2003) , 10.1145/948187.948190
Xifeng Yan, X. Jasmine Zhou, Jiawei Han, Mining closed relational graphs with connectivity constraints knowledge discovery and data mining. pp. 324- 333 ,(2005) , 10.1145/1081870.1081908
Sergey Brin, Lawrence Page, The anatomy of a large-scale hypertextual Web search engine the web conference. ,vol. 30, pp. 107- 117 ,(1998) , 10.1016/S0169-7552(98)00110-X
Mary McGlohon, Stephen Bay, Markus G. Anderle, David M. Steier, Christos Faloutsos, SNARE Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09. pp. 1265- 1274 ,(2009) , 10.1145/1557019.1557155
Jian Pei, Daxin Jiang, Aidong Zhang, On mining cross-graph quasi-cliques knowledge discovery and data mining. pp. 228- 238 ,(2005) , 10.1145/1081870.1081898