Email Spam Filtering: A Systematic Review

作者： Gordon V. Cormack

DOI:

关键词: Information extraction 、 Computer science 、 Communication source 、 Email spam 、 Information retrieval 、 Forum spam 、 Focus (computing) 、 Filter (signal processing) 、 Spambot 、 Component (UML)

摘要: Spam is information crafted to be delivered a large number of recipients, in spite their wishes. A spam filter an automated tool recognize so as prevent its delivery. The purposes and filters are diametrically opposed: effective if it evades filters, while recognizes spam. circular nature these definitions, along with appeal the intent sender recipient make them difficult formalize. typical email user has working definition no more formal than "I know when I see it." Yet, current remarkably effective, might expected given level uncertainty debate over spam, state-of-the-art retrieval machine learning methods for seemingly similar problems. But they enough? Which better? How improved? Will effectiveness compromised by cleverly spam? We survey proposed filtering techniques particular emphasis on how well work. Our primary focus email; Similarities differences other communication storage media — such instant messaging Web addressed peripherally. In doing we examine user's requirements role one component complex universe. Well-known detailed sufficiently exposition self-contained, however, considerations unique Comparisons, wherever possible, use common evaluation measures, control experimental setup. Such comparisons not easy, benchmarks, evaluating still evolving. We efforts, results limitations. recent advances methodology, many uncertainties (including widely held but unsubstantiated beliefs) remain validity methods. outline several propose address them.

参考文章(139)

William Arnold, Ted Markowitz, Richard Segal, Fast Uncertainty Sampling for Labeling Large E-mail Corpora. conference on email and anti-spam. ,(2006)

W. S. Yerazunis, The Spam-Filtering Accuracy Plateau at 99.9 percent Accuracy and How to Get Past It ,(2004)

Jeffrey O. Kephart, Jason Crawford, Barry Leiba, Richard Segal, SpamGuru: An Enterprise Anti-Spam Filtering System. conference on email and anti-spam. ,(2004)

Gabriel Wachman, David Sculley, Relaxed Online SVMs in the TREC Spam Filtering Track. text retrieval conference. ,(2007)

Gordon V. Cormack, Harnessing Unlabeled Examples through Iterative Application of Dynamic Markov Modeling ,(2006)

Georgios Paliouras, Nikolaos Trogkanis, TPN 2 : Using positive-only learning to deal with the heterogeneity of labeled and unlabeled data ,(2006)

Ion Androutsopoulos, Eirinaios Michelakis, E. Michelakis, Georgios Paliouras, Learning to Filter Unsolicited Commercial E-Mail ,(2006)

Wilfried N. Gansterer, Andreas G. K. Janecek, Robert Neumayer, Spam Filtering Based on Latent Semantic Indexing Springer, London. pp. 165- 183 ,(2008) , 10.1007/978-1-84800-046-9_9

William S. Yerazunis, Seven Hypothesis about Spam Filtering. text retrieval conference. ,(2006)

10.

Shyhtsun Felix Wu, Gregory L. Wittel, On Attacking Statistical Spam Filters. conference on email and anti-spam. ,(2004)

Email Spam Filtering: A Systematic Review

来源期刊

我的账户

Email Spam Filtering: A Systematic Review

来源期刊

相似文章 10

我的账户