Problems of KDD Cup 99 Dataset Existed and Data Preprocessing

作者: Yan Wang , Kun Yang , Xiang Jing , Huang Long Jin

DOI: 10.4028/WWW.SCIENTIFIC.NET/AMM.667.218

关键词: Process (engineering)Intrusion detection systemData miningAttack modelTraining setBenchmark (computing)De factoData pre-processingIntrusionEngineering

摘要: KDD Cup 99 dataset is not only the most widely used in intrusion detection, but also de facto benchmark on evaluating performance merits of detection system. Nevertheless there are a lot issues this which cannot be omitted. In order to establish good data mining models and find appropriate network attack types’ features, researchers should have well-known understanding dataset. paper, first foremost we made an in-depth analysis problems existed, given related solutions. Secondly, carried out plenty preprocessing 10% subset dataset’s training set, giving better results following process. What’s more, by comparing 10 common kinds algorithms our experiment, analyzed summarized that plays vital role importance algorithms.

参考文章(18)
Adetunmbi Adebayo Olusola, Oladele S Adeola, Oladuni Abosede Daramola, None, Relevance Features Selection for Intrusion Detection Springer, New York, NY. pp. 407- 418 ,(2011) , 10.1007/978-1-4614-0373-9_31
Eleazar Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy, Sal Stolfo, A Geometric Framework for Unsupervised Anomaly Detection Applications of Data Mining in Computer Security. pp. 77- 101 ,(2002) , 10.1007/978-1-4615-0953-0_4
E Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy, Sal Stolfo, A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA APPLICATIONS OF DATA MINING IN COMPUTER SECURITY. pp. 0- 0 ,(2002) , 10.7916/D8D50TQT
George H John, Ron Kohavi, Karl Pfleger, None, Irrelevant Features and the Subset Selection Problem Machine Learning Proceedings 1994. pp. 121- 129 ,(1994) , 10.1016/B978-1-55860-335-6.50023-4
Maheshkumar Sabhnani, Gursel Serpen, Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set intelligent data analysis. ,vol. 8, pp. 403- 415 ,(2004) , 10.3233/IDA-2004-8406
John McHugh, Testing Intrusion detection systems ACM Transactions on Information and System Security. ,vol. 3, pp. 262- 294 ,(2000) , 10.1145/382912.382923
Klaus-Robert Müller, Christin Schäfer, Pavel Laskov, Igor V. Kotenko, Intrusion detection in unlabeled data with quarter-sphere Support Vector Machines Praxis Der Informationsverarbeitung Und Kommunikation. ,vol. 27, pp. 228- 236 ,(2004) , 10.17877/DE290R-15912
Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, Ali A. Ghorbani, A detailed analysis of the KDD CUP 99 data set computational intelligence and security. pp. 53- 58 ,(2009) , 10.1109/CISDA.2009.5356528
Isabelle Guyon, André Elisseeff, An introduction to variable and feature selection Journal of Machine Learning Research. ,vol. 3, pp. 1157- 1182 ,(2003) , 10.1162/153244303322753616
A.H. Sung, S. Mukkamala, Identifying important features for intrusion detection using support vector machines and neural networks symposium on applications and the internet. pp. 209- 216 ,(2003) , 10.1109/SAINT.2003.1183050