作者: Yeon-sup Lim , Hyun-chul Kim , Jiwoong Jeong , Chong-kwon Kim , Ted "Taekyoung" Kwon
关键词: Discriminative model 、 Data mining 、 Artificial intelligence 、 Discretization 、 Network packet 、 Machine learning 、 Entropy (information theory) 、 Traffic classification 、 Minimum description length 、 Computer science 、 The Internet 、 Statistical classification
摘要: Recent research on Internet traffic classification has yield a number of data mining techniques for distinguishing types traffic, but no systematic analysis "Why" some algorithms achieve high accuracies. In pursuit empirically grounded answers to the question, which is critical in understanding and establishing scientific ground research, this paper reveals three sources discriminative power classifying application traffic: (i) ports, (ii) sizes first one-two (for UDP flows) or four-five TCP packets, (iii) discretization those features. We find that C4.5 performs best under any circumstances, as well reason why; because algorithm discretizes input features during operations. also entropy-based Minimum Description Length ports packet size substantially improve accuracy every machine learning tested (by much 59.8%!) make all them >93% average without algorithm-specific tuning processes. Our results indicate dealing with discrete nominal intervals, not continuous numbers, essential basis accurate (i.e., should be discretized first), regardless use.