Attacking DBSCAN for Fun and Profit.

作者: Jonathan Crussell , Philip Kegelmeyer

DOI: 10.1137/1.9781611974010.27

关键词:

摘要: Many security applications depend critically on clustering. However, we do not know of any clustering algorithms that were designed with an adversary in mind. An intelligent may be able to use this her advantage subvert the application. Already, adversaries obfuscation and other techniques alter representation their inputs feature space avoid detection. As one example, spam email often mimics normal email. In work, investigate a more active attack, which attempts analysis by feeding carefully crafted data points. Specifically, work explore how attacker can DBSCAN, popular density-based algorithm. We “confidence attack,” where seeks poison clusters point defender loses confidence utility system. This result system being abandoned, or worse, waste defender’s time investigating false alarms. While our attacks generalize all DBSCANbased tools, focus evaluation AnDarwin, tool detect plagiarized Android apps. show merge arbitrary connecting them “bridges”, even small number merges greatly degrade performance, has limited recourse when relying solely DBSCAN. Finally, propose remediation process uses machine learning features based outlier measures are orthogonal underlying problem remove injected

参考文章(14)
Jonathan Crussell, Clint Gibler, Hao Chen, AnDarwin: Scalable Detection of Semantically Similar Android Applications european symposium on research in computer security. pp. 182- 199 ,(2013) , 10.1007/978-3-642-40203-6_11
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
J.G. Dutrisac, D.B. Skillicorn, Hiding clusters in adversarial settings intelligence and security informatics. pp. 185- 187 ,(2008) , 10.1109/ISI.2008.4565051
M. Slaney, M. Casey, Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes] IEEE Signal Processing Magazine. ,vol. 25, pp. 128- 131 ,(2008) , 10.1109/MSP.2007.914237
Ji-Rong Wen, HongJiang Zhang, Jian-Yun Nie, Query Clustering Using User Logs ACM Transactions on Information Systems. ,vol. 20, pp. 59- 81 ,(2002)
Battista Biggio, Ignazio Pillai, Samuel Rota Bulò, Davide Ariu, Marcello Pelillo, Fabio Roli, Is data clustering in adversarial settings secure Proceedings of the 2013 ACM workshop on Artificial intelligence and security. pp. 87- 98 ,(2013) , 10.1145/2517312.2517321
Murat Kantarcıoğlu, Bowei Xi, Chris Clifton, Classifier evaluation and attribute selection against active adversaries Data Mining and Knowledge Discovery. ,vol. 22, pp. 291- 335 ,(2011) , 10.1007/S10618-010-0197-3
Nguyen Xuan Vinh, Julien Epps, James Bailey, Information theoretic measures for clusterings comparison Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. pp. 1073- 1080 ,(2009) , 10.1145/1553374.1553511
Clint Gibler, Ryan Stevens, Jonathan Crussell, Hao Chen, Hui Zang, Heesook Choi, AdRob: examining the landscape and impact of android application plagiarism international conference on mobile systems, applications, and services. pp. 431- 444 ,(2013) , 10.1145/2462456.2464461
Jeffrey Erman, Martin Arlitt, Anirban Mahanti, Traffic classification using clustering algorithms Proceedings of the 2006 SIGCOMM workshop on Mining network data - MineNet '06. pp. 281- 286 ,(2006) , 10.1145/1162678.1162679