Optimization Based Clustering Algorithms for Authorship Analysis of Phishing Emails

作者: Sattar Seifollahi , Adil Bagirov , Robert Layton , Iqbal Gondal

DOI: 10.1007/S11063-017-9593-7

关键词: Similarity measuretf–idfWordNetData miningVocabularyCluster analysisPhishingInformation retrievalComputational intelligenceWeightingComputer science

摘要: Phishing has given attackers power to masquerade as legitimate users of organizations, such banks, scam money and private information from victims. is so widespread that combating the phishing attacks could overwhelm victim organization. It important group formulate effective defence mechanism. In this paper, we use clustering methods analyze characterize emails perform their relative attribution. Emails are first tokenized a bag-of-word space and, then, transformed numeric vector using frequencies words in documents. Wordnet vocabulary used take effects similar into account reduce sparsity. The word similarity measure combined with term introduce novel text transformation features. To improve accuracy, apply inverse document frequency weighting, which gives higher weights features by fewer authors. k-means recently introduced three optimization based algorithms: MS-MGKM, INCA DCClust applied for purposes. algorithms indicate existence well separated clusters dataset.

参考文章(40)
Adil M. Bagirov, Ehsan Mohebi, Nonsmooth Optimization Based Algorithms in Cluster Analysis Springer, Cham. pp. 99- 146 ,(2015) , 10.1007/978-3-319-09259-1_4
Jianbin Ma, Guifa Teng, Yuxin Zhang, Yueli Li, Ying Li, A Cybercrime Forensic Method for Chinese Web Information Authorship Analysis pacific asia workshop on intelligence and security informatics. pp. 14- 24 ,(2009) , 10.1007/978-3-642-01393-5_3
Young-Gab Kim, Sanghyun Cho, Jun-Sub Lee, Min-Soo Lee, In Ho Kim, Sung Hoon Kim, Method for Evaluating the Security Risk of a Website Against Phishing Attacks Intelligence and Security Informatics. pp. 21- 31 ,(2008) , 10.1007/978-3-540-69304-8_3
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Georgia Frantzeskou, Stephen G. MacDonell, Efstathios Stamatatos, Source code authorship analysis for supporting the cybercrime investigation process international conference on e-business and telecommunication networks. pp. 85- 92 ,(2004) , 10.4018/978-1-60566-836-9.CH020
Roger Dingledine, Nick Mathewson, Paul Syverson, Tor: the second-generation onion router usenix security symposium. pp. 21- 21 ,(2004) , 10.21236/ADA465464
Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, Sokratis Katsikas, Source Code Author Identification Based on N-gram Author Profiles artificial intelligence applications and innovations. pp. 508- 515 ,(2006) , 10.1007/0-387-34224-9_59
Pavel Pudil, Jana Novovičová, Novel Methods for Feature Subset Selection with Respect to Problem Knowledge Springer, Boston, MA. pp. 101- 116 ,(1998) , 10.1007/978-1-4615-5725-8_7
Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0