作者: Mohammad M Masud , Tahseen M Al-Khateeb , Kevin W Hamlen , Jing Gao , Latifur Khan
关键词: Machine learning 、 Feature extraction 、 Intrusion detection system 、 Data stream 、 Scalability 、 Data stream mining 、 Cloud computing 、 Synthetic data 、 Artificial intelligence 、 Cloud computing architecture 、 Data mining 、 Computer science 、 General Computer Science 、 Management information systems
摘要: Data stream classification for intrusion detection poses at least three major challenges. First, these data streams are typically infinite-length, making traditional multipass learning algorithms inapplicable. Second, they exhibit significant concept-drift as attackers react and adapt to defenses. Third, that do not have any fixed feature set, such text streams, an additional extraction selection task must be performed. If the number of candidate features is too large, then techniques fail.In order address first two challenges, this article proposes a multipartition, multichunk ensemble classifier in which collection v classifiers trained from r consecutive chunks using v-fold partitioning data, yielding classifiers. This technique significantly reduces error compared existing single-partition, single-chunk approaches, wherein single chunk used train each classifier. To third challenge, proposed set. The technique's scalability demonstrated through implementation Hadoop MapReduce cloud computing architecture. Both theoretical empirical evidence demonstrate its effectiveness over other state-of-the-art on synthetic real botnet traffic, malicious executables.