Spam Sender Detection with Classification Modeling on Highly Imbalanced Mail Server Behavior Data.

作者: Dmitri Alperovitch , Yuchun Tang , Paul Judge , Sven Krasser

DOI:

关键词:

摘要: Unsolicited commercial or bulk emails containing viruses pose a great threat to the utility of email communications. A recent solution for filtering is reputation systems that can assign value trust each IP address sending messages. By analyzing query patterns node utilizing information, calculate score queried address. In this research, we explore behavioral classification approach based on features extracted from such global messaging patterns. Due large amount bad senders, task has cope with highly imbalanced data. Firstly, observed sender, periodicity properties using discrete Fourier transform and breadth information reflecting message volume recipient distribution. After that, Granular Support Vector Machine - Boundary Alignment algorithm (GSVM-BA) implemented solve class imbalance problem compared cost sensitive learning. Lastly, determine performance support vector machine, C4.5 decision trees, na¨ ive Bayesian multinomial logistic regression classifiers resulting data set. The best by GSVM-BA rebalance then SVM classification.

参考文章(20)
Tsau Young (‘T.Y.’) Lin, Data Mining and Machine Oriented Modeling: A Granular Computing Approach Applied Intelligence. ,vol. 13, pp. 113- 124 ,(2000) , 10.1023/A:1008384328214
David Dagon, Nick Feamster, Anirudh Ramachandran, Revealing botnet membership using DNSBL counter-intelligence conference on steps to reducing unwanted traffic on internet. pp. 8- 8 ,(2006)
Rehan Akbani, Stephen Kwek, Nathalie Japkowicz, Applying support vector machines to imbalanced datasets european conference on machine learning. ,vol. 3201, pp. 39- 50 ,(2004) , 10.1007/978-3-540-30115-8_7
Ron Kohavi, Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid knowledge discovery and data mining. pp. 202- 207 ,(1996)
Jian-xiong Dong, C.Y. Suen, A. Krzyzak, Algorithms of fast SVM evaluation based on subspace projection international joint conference on neural network. ,vol. 2, pp. 865- 870 ,(2005) , 10.1109/IJCNN.2005.1555966
Nathalie Japkowicz, Shaju Stephen, The class imbalance problem: A systematic study intelligent data analysis. ,vol. 6, pp. 429- 449 ,(2002) , 10.3233/IDA-2002-6504
Witold Pedrycz, Granular computing: an introduction joint ifsa world congress and nafips international conference. ,vol. 3, pp. 1349- 1354 ,(2001) , 10.1007/978-3-7908-1856-7_15
Gary M. Weiss, Mining with rarity ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 7- 19 ,(2004) , 10.1145/1007730.1007734
Hideki Isozaki, Hideto Kazawa, Efficient support vector classifiers for named entity recognition Proceedings of the 19th international conference on Computational linguistics -. pp. 1- 7 ,(2002) , 10.3115/1072228.1072282
Yuchun Tang, Sven Krasser, Paul Judge, Yan-Qing Zhang, Fast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data collaborative computing. pp. 1- 6 ,(2006) , 10.1109/COLCOM.2006.361856