Cloud-based malware detection for evolving data streams

作者: Mohammad M Masud , Tahseen M Al-Khateeb , Kevin W Hamlen , Jing Gao , Latifur Khan

DOI: 10.1145/2019618.2019622

关键词: Machine learningFeature extractionIntrusion detection systemData streamScalabilityData stream miningCloud computingSynthetic dataArtificial intelligenceCloud computing architectureData miningComputer scienceGeneral Computer ScienceManagement information systems

摘要: Data stream classification for intrusion detection poses at least three major challenges. First, these data streams are typically infinite-length, making traditional multipass learning algorithms inapplicable. Second, they exhibit significant concept-drift as attackers react and adapt to defenses. Third, that do not have any fixed feature set, such text streams, an additional extraction selection task must be performed. If the number of candidate features is too large, then techniques fail.In order address first two challenges, this article proposes a multipartition, multichunk ensemble classifier in which collection v classifiers trained from r consecutive chunks using v-fold partitioning data, yielding classifiers. This technique significantly reduces error compared existing single-partition, single-chunk approaches, wherein single chunk used train each classifier. To third challenge, proposed set. The technique's scalability demonstrated through implementation Hadoop MapReduce cloud computing architecture. Both theoretical empirical evidence demonstrate its effectiveness over other state-of-the-art on synthetic real botnet traffic, malicious executables.

参考文章(35)
William W. Cohen, Learning Rules that Classify E-Mail ,(1996)
David Dagon, Chris Nunnery, Vikram Sharma, Brent ByungHoon Kang, Julian B. Grizzard, Peer-to-peer botnets: overview and case study conference on workshop on hot topics in understanding botnets. pp. 1- 1 ,(2007)
Paul Barford, Vinod Yegneswaran, An Inside Look at Botnets Advances in Information Security. pp. 171- 191 ,(2007) , 10.1007/978-0-387-44599-1_8
Martin Gilje Jaatun, Chunming Rong, Gansen Zhao, Proceedings of the 1st International Conference on Cloud Computing ,(2009)
Wei Fan, Systematic data selection to mine concept-drifting data streams knowledge discovery and data mining. pp. 128- 137 ,(2004) , 10.1145/1014052.1014069
Kevin W Hamlen, Vishwath Mohan, Mohammad M Masud, Latifur Khan, Bhavani Thuraisingham, None, Exploiting an antivirus interface Computer Standards & Interfaces. ,vol. 31, pp. 1182- 1189 ,(2009) , 10.1016/J.CSI.2009.04.004
Mohammad M Masud, Latifur Khan, Bhavani Thuraisingham, None, A scalable multi-level feature extraction technique to detect malicious executables Information Systems Frontiers. ,vol. 10, pp. 33- 45 ,(2008) , 10.1007/S10796-007-9054-3
Haixun Wang, Wei Fan, Philip S. Yu, Jiawei Han, Mining concept-drifting data streams using ensemble classifiers knowledge discovery and data mining. pp. 226- 235 ,(2003) , 10.1145/956750.956778
Geoff Hulten, Laurie Spencer, Pedro Domingos, Mining time-changing data streams knowledge discovery and data mining. pp. 97- 106 ,(2001) , 10.1145/502512.502529
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, Ricard Gavaldà, New ensemble methods for evolving data streams Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09. pp. 139- 148 ,(2009) , 10.1145/1557019.1557041