A parameter-free hybrid clustering algorithm used for malware categorization

作者: ZhiXue Han , Shaorong Feng , Yanfang Ye , Qingshan Jiang

DOI: 10.1109/ICASID.2009.5276982

关键词:

摘要: Nowadays, numerous attacks made by the malware, such as viruses, backdoors, spyware, trojans and worms, have presented a major security threat to computer users. The most significant line of defense against malware is anti-virus products which detects, removes, characterizes these threats. ability AV successfully characterize threats greatly depends on method for categorizing profiles into groups. Therefore, clustering different families one topics that are great interest. In this paper, resting analysis extracted instruction samples, we propose novel parameter-free hybrid algorithm (PFHC) combines merits hierarchical K-means algorithms clustering. It can not only generate stable initial division, but also give best K. PFHC first utilizes agglomerative frame, starting with N singleton clusters, each exactly includes sample, then reuses centroids upper level in every merges two nearest finally adopts iteration achieve an approximate global optimal division. evaluates validity procedure generates K comparing values. promising studies real daily data collection illustrate that, compared popular existing approaches, our proposed always much higher quality clusters it be well used categorization.

参考文章(19)
Y. Fukuyama, A new method of choosing the number of clusters for the fuzzy c-mean method Proc. 5th Fuzzy Syst. Symp., 1989. pp. 247- 250 ,(1989)
Wei-Jen Li, Ke Wang, Salvatore J. Stolfo, Fileprint analysis for Malware Detection 1 ,(2005)
Eng Yeow Cheu, Chee Keong Kwoh, Zonglin Zhou, On the Two-level Hybrid Clustering Algorithm ,(2004)
Teuvo Kohonen, SELF-ORGANIZING MAPS: OPHMIZATION APPROACHES international conference on artificial neural networks. pp. 981- 990 ,(1991) , 10.1016/B978-0-444-89178-5.50003-8
Aditya P. Mathur, Nwokedi Idika, A Survey of Malware Detection Techniques ,(2007)
James Franklin, The elements of statistical learning : data mining, inference,and prediction The Mathematical Intelligencer. ,vol. 27, pp. 83- 85 ,(2005) , 10.1007/BF02985802
Sungwoo Kwon, Chonghun Han, Hybrid Clustering Method for DNA Microarray Data Analysis Genome Informatics. ,vol. 13, pp. 258- 259 ,(2002) , 10.11234/GI1990.13.258
Gary McGraw, Greg Morrisett, Attacking Malicious Code: A Report to the Infosec Research Council IEEE Software. ,vol. 17, pp. 33- 41 ,(2000) , 10.1109/52.877857
J. A. Hartigan, M. A. Wong, A K-Means Clustering Algorithm Journal of The Royal Statistical Society Series C-applied Statistics. ,vol. 28, pp. 100- 108 ,(1979) , 10.2307/2346830
Eric Filiol, Malware Pattern Scanning Schemes Secure Against Black-box Analysis Journal in Computer Virology. ,vol. 2, pp. 35- 50 ,(2006) , 10.1007/S11416-006-0009-X