Advances in Spam Filtering Techniques

作者: Tiago A. Almeida , Akebo Yamakami

DOI: 10.1007/978-3-642-25237-2_12

关键词: NoveltyFilter (signal processing)Naive Bayes classifierMinimum description lengthMachine learningArtificial intelligenceSupport vector machineCollaborative filteringComputer science

摘要: Nowadays e-mail spam is not a novelty, but it still an important rising problem with big economic impact in society. Fortunately, there are different approaches able to automatically detect and remove most of those messages, the best-known ones based on machine learning techniques, such as Naive Bayes classifiers Support Vector Machines. However, several models filters, something literature does always acknowledge. In this chapter, we present compare seven versions classifiers, well-known linear Machine new method Minimum Description Length principle. Furthermore, have conducted empirical experiment six public real non-encoded datasets. The results indicate that proposed filter easy implement, incrementally updateable clearly outperforms state-of-the-art filters.

参考文章(32)
Ion Androutsopoulos, Eirinaios Michelakis, E. Michelakis, Georgios Paliouras, Learning to Filter Unsolicited Commercial E-Mail ,(2006)
Christian Siefkes, Fidelis Assis, Shalendra Chhabra, William S. Yerazunis, Combining winnow and orthogonal sparse bigrams for incremental spam filtering european conference on principles of data mining and knowledge discovery. pp. 410- 421 ,(2004) , 10.1007/978-3-540-30116-5_38
Gabriel Wachman, Carla E. Brodley, David Sculley, Spam Filtering Using Inexact String Matching in Explicit Feature Space with On-Line Linear Classifiers. text retrieval conference. ,(2006)
Karl-Michael Schneider, On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification Advances in Natural Language Processing. pp. 474- 485 ,(2004) , 10.1007/978-3-540-30228-5_42
Peter Grünwald, A Tutorial Introduction to the Minimum Description Length Principle arXiv: Statistics Theory. ,(2004)
Mehran Sahami, Susan Dumais, Eric Horvitz, David Heckerman, A Bayesian Approach to Filtering Junk E-Mail national conference on artificial intelligence. ,(1998)
George H. John, Pat Langley, Estimating continuous distributions in Bayesian classifiers uncertainty in artificial intelligence. pp. 338- 345 ,(1995)
Tiago A. Almeida, Akebo Yamakami, Jurandy Almeida, Filtering spams using the minimum description length principle Proceedings of the 2010 ACM Symposium on Applied Computing - SAC '10. pp. 1854- 1858 ,(2010) , 10.1145/1774088.1774481
Shugang Liu, Kebin Cui, Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering Modern Applied Science. ,vol. 3, pp. 27- ,(2009) , 10.5539/MAS.V3N10P27
Muhammad N. Marsono, M. Watheq El-Kharashi, Fayez Gebali, Targeting spam control on middleboxes: Spam detection based on layer-3 e-mail content classification Computer Networks. ,vol. 53, pp. 835- 848 ,(2009) , 10.1016/J.COMNET.2008.11.012