作者: Sin-Jae Kang , Sae-Bom Lee , Jong-Wan Kim , In-Gil Nam
DOI: 10.1007/978-3-540-30497-5_124
关键词:
摘要: This paper describes a two-phase method for filtering spam mails based on textual information and hyperlinks. Since the body of mail has little text information, it provides insufficient hints to distinguish from legitimate mails. To resolve this problem, we follows hyperlinks contained in email body, fetches contents remote webpage, extracts (i.e., features) original fetched webpages. We divided into two kinds information: definite less information. In our experiment, fetching web pages achieved an improvement F-measure by 9.4% over using header only.