Exploiting redundancy in natural language to penetrate Bayesian spam filters

作者: Engin Kirda , Christopher Kruegel , Christoph Karlberger , Günther Bayler

DOI:

关键词:

摘要: Today's attacks against Bayesian spam filters attempt to keep the content of mails visible humans, but obscured filters. A common technique is fool by appending additional words a mail. Because these appear very rarely in mails, are inclined classify mail as legitimate. The idea we present this paper leverages fact that natural language typically contains synonyms. Synonyms different describe similar terms and concepts. Such often have significantly probabilities. Thus, an attacker might be able penetrate replacing suspicious innocuous with same meaning. precondition for success such attack users assign probabilities tokens. We first examine whether met; afterwards, measure effectivity automated substitution creating test set messages tested SpamAssassin, DSPAM, Gmail.

参考文章(6)
Daniel Lowd, Christopher Meek, Good Word Attacks on Statistical Spam Filters. conference on email and anti-spam. ,(2005)
Shyhtsun Felix Wu, Gregory L. Wittel, On Attacking Statistical Spam Filters. conference on email and anti-spam. ,(2004)
Calton Pu, Steve Webb, Observed Trends in Spam Construction Techniques: A Case Study of Spam Evolution. conference on email and anti-spam. ,(2006)
Doug Cutting, Julian Kupiec, Jan Pedersen, Penelope Sibun, A Practical Part-of-Speech Tagger conference on applied natural language processing. pp. 133- 140 ,(1992) , 10.3115/974499.974523
H.B. Aradhye, G.K. Myers, J.A. Herson, Image analysis for efficient categorization of image-based spam e-mail international conference on document analysis and recognition. pp. 914- 918 ,(2005) , 10.1109/ICDAR.2005.135