作者: Sattar Seifollahi , Adil Bagirov , Robert Layton , Iqbal Gondal
DOI: 10.1007/S11063-017-9593-7
关键词: Similarity measure 、 tf–idf 、 WordNet 、 Data mining 、 Vocabulary 、 Cluster analysis 、 Phishing 、 Information retrieval 、 Computational intelligence 、 Weighting 、 Computer science
摘要: Phishing has given attackers power to masquerade as legitimate users of organizations, such banks, scam money and private information from victims. is so widespread that combating the phishing attacks could overwhelm victim organization. It important group formulate effective defence mechanism. In this paper, we use clustering methods analyze characterize emails perform their relative attribution. Emails are first tokenized a bag-of-word space and, then, transformed numeric vector using frequencies words in documents. Wordnet vocabulary used take effects similar into account reduce sparsity. The word similarity measure combined with term introduce novel text transformation features. To improve accuracy, apply inverse document frequency weighting, which gives higher weights features by fewer authors. k-means recently introduced three optimization based algorithms: MS-MGKM, INCA DCClust applied for purposes. algorithms indicate existence well separated clusters dataset.