A supervised topic transition model for detecting malicious system call sequences

作者: Han Xiao , Thomas Stibor

DOI: 10.1145/2023568.2023577

关键词:

摘要: We propose a probabilistic model for behavior-based malware detection that jointly models sequential data and class labels. Given labeled sequences (harmless/malicious), our goal is to reveal behavior patterns exploit them predict labels of unknown sequences. The proposed novel extension supervised latent Dirichlet allocation with an estimation algorithm alternates between Gibbs sampling gradient descent. Experiments on real-world set show can learn meaningful patterns, provides competitive performance the task. Moreover, we parallelize training demonstrate scalability varying numbers processors.

参考文章(14)
Michal Rosen-Zvi, Yair Weiss, Amit Gruber, Hidden Topic Markov Models international conference on artificial intelligence and statistics. pp. 163- 170 ,(2007)
Thomas Hofmann, Probabilistic latent semantic analysis uncertainty in artificial intelligence. ,vol. 15, pp. 289- 296 ,(1999)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Steven A. Hofmeyr, Stephanie Forrest, Anil Somayaji, Intrusion detection using sequences of system calls Journal of Computer Security. ,vol. 6, pp. 151- 180 ,(1998) , 10.3233/JCS-980109
Jon D. Mcauliffe, David M. Blei, Supervised Topic Models neural information processing systems. ,vol. 20, pp. 121- 128 ,(2007)
Hanna M. Wallach, Topic modeling Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 977- 984 ,(2006) , 10.1145/1143844.1143967
Mark Steyvers, Thomas L. Griffiths, Joshua B. Tenenbaum, David M. Blei, Integrating Topics and Syntax neural information processing systems. ,vol. 17, pp. 537- 544 ,(2004)
Marcus A. Maloof, J. Zico Kolter, Learning to Detect and Classify Malicious Executables in the Wild Journal of Machine Learning Research. ,vol. 7, pp. 2721- 2744 ,(2006) , 10.5555/1248547.1248646
Mark Girolami, Ata Kabán, Sequential Activity Profiling : Latent Dirichlet Allocation of Markov Chains Data Mining and Knowledge Discovery. ,vol. 10, pp. 175- 196 ,(2005) , 10.1007/S10618-005-0362-2
Padhraic Smyth, David Newman, Max Welling, Arthur U. Asuncion, Distributed Inference for Latent Dirichlet Allocation neural information processing systems. ,vol. 20, pp. 1081- 1088 ,(2007)