Optimization Based Clustering Algorithms for Authorship Analysis of Phishing Emails

作者： Sattar Seifollahi , Adil Bagirov , Robert Layton , Iqbal Gondal

关键词: Similarity measure 、 tf–idf 、 WordNet 、 Data mining 、 Vocabulary 、 Cluster analysis 、 Phishing 、 Information retrieval 、 Computational intelligence 、 Weighting 、 Computer science

摘要: Phishing has given attackers power to masquerade as legitimate users of organizations, such banks, scam money and private information from victims. is so widespread that combating the phishing attacks could overwhelm victim organization. It important group formulate effective defence mechanism. In this paper, we use clustering methods analyze characterize emails perform their relative attribution. Emails are first tokenized a bag-of-word space and, then, transformed numeric vector using frequencies words in documents. Wordnet vocabulary used take effects similar into account reduce sparsity. The word similarity measure combined with term introduce novel text transformation features. To improve accuracy, apply inverse document frequency weighting, which gives higher weights features by fewer authors. k-means recently introduced three optimization based algorithms: MS-MGKM, INCA DCClust applied for purposes. algorithms indicate existence well separated clusters dataset.

参考文章(40)

Adil M. Bagirov, Ehsan Mohebi, Nonsmooth Optimization Based Algorithms in Cluster Analysis Springer, Cham. pp. 99- 146 ,(2015) , 10.1007/978-3-319-09259-1_4

Jianbin Ma, Guifa Teng, Yuxin Zhang, Yueli Li, Ying Li, A Cybercrime Forensic Method for Chinese Web Information Authorship Analysis pacific asia workshop on intelligence and security informatics. pp. 14- 24 ,(2009) , 10.1007/978-3-642-01393-5_3

Young-Gab Kim, Sanghyun Cho, Jun-Sub Lee, Min-Soo Lee, In Ho Kim, Sung Hoon Kim, Method for Evaluating the Security Risk of a Website Against Phishing Attacks Intelligence and Security Informatics. pp. 21- 31 ,(2008) , 10.1007/978-3-540-69304-8_3

Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)

Georgia Frantzeskou, Stephen G. MacDonell, Efstathios Stamatatos, Source code authorship analysis for supporting the cybercrime investigation process international conference on e-business and telecommunication networks. pp. 85- 92 ,(2004) , 10.4018/978-1-60566-836-9.CH020

Cybercrime: Security and Surveillance in the Information Age Routledge. ,(2000) , 10.4324/9780203354643

Roger Dingledine, Nick Mathewson, Paul Syverson, Tor: the second-generation onion router usenix security symposium. pp. 21- 21 ,(2004) , 10.21236/ADA465464

Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, Sokratis Katsikas, Source Code Author Identification Based on N-gram Author Profiles artificial intelligence applications and innovations. pp. 508- 515 ,(2006) , 10.1007/0-387-34224-9_59

Pavel Pudil, Jana Novovičová, Novel Methods for Feature Subset Selection with Respect to Problem Knowledge Springer, Boston, MA. pp. 101- 116 ,(1998) , 10.1007/978-1-4615-5725-8_7

10.

Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0

Optimization Based Clustering Algorithms for Authorship Analysis of Phishing Emails

来源期刊

我的账户

Optimization Based Clustering Algorithms for Authorship Analysis of Phishing Emails

来源期刊

相似文章 7

A Fuzzy Ontology and SVM–Based Web Content Classification System

Taxonomy-Augmented Features for Document Clustering.

SOK: A Comprehensive Reexamination of Phishing Research from the Security Perspective.

SoK: A Comprehensive Reexamination of Phishing Research From the Security Perspective

Introduction to Clustering

A simulated annealing-based maximum-margin clustering algorithm: A Maximum Margin Clustering Algorithm

A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

我的账户