Two phase approach for spam-mail filtering

作者: Sin-Jae Kang , Sae-Bom Lee , Jong-Wan Kim , In-Gil Nam

DOI: 10.1007/978-3-540-30497-5_124

关键词:

摘要: This paper describes a two-phase method for filtering spam mails based on textual information and hyperlinks. Since the body of mail has little text information, it provides insufficient hints to distinguish from legitimate mails. To resolve this problem, we follows hyperlinks contained in email body, fetches contents remote webpage, extracts (i.e., features) original fetched webpages. We divided into two kinds information: definite less information. In our experiment, fetching web pages achieved an improvement F-measure by 9.4% over using header only.

参考文章(10)
Jihoon Yang, Sung-Yong Park, Venkat Chalasani, Intelligent Email Categorization Based on Textual Information and Metadata IEICE Transactions on Information and Systems. ,vol. 86, pp. 1280- 1288 ,(2003)
Thorsten Joachims, Text categorization with support vector machines Universität Dortmund. ,(1999) , 10.17877/DE290R-5097
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)
Mehran Sahami, Susan Dumais, Eric Horvitz, David Heckerman, A Bayesian Approach to Filtering Junk E-Mail national conference on artificial intelligence. ,(1998)
Vicky Hardman, Martina Angela Sasse, Isidor Kouvelas, Successful multiparty audio communication over the Internet Communications of The ACM. ,vol. 41, pp. 74- 80 ,(1998) , 10.1145/274946.274959
Ian H. Witten, Eibe Frank, Data mining ACM SIGMOD Record. ,vol. 31, pp. 76- 77 ,(2002) , 10.1145/507338.507355
Thorsten Joachims, Text Categorization with Suport Vector Machines: Learning with Many Relevant Features european conference on machine learning. ,vol. 1398, pp. 137- 142 ,(1998) , 10.1007/BFB0026683
H. Drucker, Donghui Wu, V.N. Vapnik, Support vector machines for spam categorization IEEE Transactions on Neural Networks. ,vol. 10, pp. 1048- 1054 ,(1999) , 10.1109/72.788645
Yiming Yang, Jan O. Pedersen, A Comparative Study on Feature Selection in Text Categorization international conference on machine learning. pp. 412- 420 ,(1997)