Spam deobfuscation using a hidden markov model

作者： Andrew Y. Ng , Honglak Lee

DOI:

关键词:

摘要: To circumvent spam filters, many spammers attempt to obfuscate their emails by deliberately misspelling words or introducing other errors into the text. For example viagra may be written vigra, mortgage m0rt gage. Even though humans have little difficulty reading obfuscated emails, most content-based filters are unable recognize these words. In this paper, we present a hidden Markov model for deobfuscating emails. We empirically demonstrate that our is robust types of obfuscation including misspellings, incorrect segmentations (adding/removing spaces), and substitutions/insertions non-alphabetic characters.

uni-trier.de 本地加速

ceas.cc 本地加速

umich.edu PDF 下载加速

stanford.edu PDF 下载加速

参考文章(11)

Dan Jurafsky, James H. Martin, Speech and Language Processing ,(1999)

Frederick Jelinek, Statistical methods for speech recognition ,(1997)

Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing ,(1999)

Mehran Sahami, Susan Dumais, Eric Horvitz, David Heckerman, A Bayesian Approach to Filtering Junk E-Mail national conference on artificial intelligence. ,(1998)

A. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm IEEE Transactions on Information Theory. ,vol. 13, pp. 260- 269 ,(1967) , 10.1109/TIT.1967.1054010

Sean R Eddy, None, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids ,(1998)

S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 24, pp. 509- 522 ,(2002) , 10.1109/34.993558

E.S. Ristad, P.N. Yianilos, Learning string-edit distance IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 20, pp. 522- 532 ,(1998) , 10.1109/34.682181

L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition Proceedings of the IEEE. ,vol. 77, pp. 267- 296 ,(1989) , 10.1109/5.18626

10.

S. Della Pietra, V. Della Pietra, J. Lafferty, Inducing features of random fields IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 19, pp. 380- 393 ,(1997) , 10.1109/34.588021

Spam deobfuscation using a hidden markov model

来源期刊

我的账户

Spam deobfuscation using a hidden markov model

来源期刊

相似文章 10

我的账户