ILAB: An Interactive Labelling Strategy for Intrusion Detection

作者: Anaël Beaugnon , Pierre Chifflier , Francis Bach

DOI: 10.1007/978-3-319-66332-6_6

关键词: Open sourceAnnotationIntrusion detection systemArtificial intelligenceLabellingWorkloadNetFlowMachine learningActive learning (machine learning)ScalabilityComputer science

摘要: Acquiring a representative labelled dataset is hurdle that has to be overcome learn supervised detection model. Labelling particularly expensive in computer security as expert knowledge required perform the annotations. In this paper, we introduce ILAB, novel interactive labelling strategy helps experts label large datasets for intrusion with reduced workload. First, compare ILAB two state-of-the-art strategies on public and demonstrate it both an effective scalable solution. Second, show workable real-world annotation project carried out unlabelled NetFlow originating from production environment. We provide open source implementation (https://github.com/ANSSI-FR/SecuML/) allow their own researchers strategies.

参考文章(46)
Benoit Claise, Cisco Systems NetFlow Services Export Version 9 RFC. ,vol. 3954, pp. 1- 33 ,(2004)
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)
Stephen M. Omohundro, Five Balltree Construction Algorithms ,(2009)
Vern Paxson, Bro: a system for detecting network intruders in real-time Computer Networks. ,vol. 31, pp. 2435- 2463 ,(1999) , 10.1016/S1389-1286(99)00112-7
Roberto Perdisci, David Dagon, Yacin Nadji, Manos Antonakakis, Nikolaos Vasiloglou, Wenke Lee, Saeed Abu-Nimeh, From throw-away traffic to bots: detecting the rise of DGA-based malware usenix security symposium. pp. 24- 24 ,(2012)
Nico Görnitz, Marius Kloft, Ulf Brefeld, Active and Semi-supervised Data Domain Description european conference on machine learning. pp. 407- 422 ,(2009) , 10.1007/978-3-642-04180-8_44
Jaeyeon Jung, V. Paxson, A.W. Berger, H. Balakrishnan, Fast portscan detection using sequential hypothesis testing ieee symposium on security and privacy. pp. 211- 225 ,(2004) , 10.1109/SECPRI.2004.1301325
M. Almgren, E. Jonsson, Using active learning in intrusion detection ieee computer security foundations symposium. pp. 88- 98 ,(2004) , 10.1109/CSFW.2004.25
David M.J. Tax, Robert P.W. Duin, Support Vector Data Description Machine Learning. ,vol. 54, pp. 45- 66 ,(2004) , 10.1023/B:MACH.0000008084.60811.49
Rion Snow, Brendan O'Connor, Daniel Jurafsky, Andrew Y. Ng, Cheap and fast---but is it good? Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 254- 263 ,(2008) , 10.3115/1613715.1613751