A 3 Ident: A Two-phased Approach to Identify the Leading Authors of Android Apps

作者: Wei Wang , Guozhu Meng , Haoyu Wang , Kai Chen , Weimin Ge

DOI: 10.1109/ICSME46990.2020.00064

关键词:

摘要: Authorship identification is the process of identifying and classifying authors through given codes. can be used in a wide range software domains, e.g., code authorship disputes, plagiarism detection, exposure attackers’ identity. Besides inherent challenges from legacy development, framework programming crowdsourcing mode Android raise difficulties significantly. More specifically, widespread third party libraries inherited components (e.g., classes, methods, variables) dilute primary within entire app blur boundaries written by different authors. However, prior research has not well addressed these challenges.To this end, we design two-phased approach to attribute an specific developer. In first phase, put forward three types strategies identify relationships between Java packages app, which consist context, semantic structural relationships. A package aggregation algorithm developed cluster all that are high probability same second develop features capture authors’ coding habits stylometry. Based on that, generate fingerprints for author its apps employ several machine learning algorithms classification. We evaluate our datasets contain 15,666 257 distinct developers achieve 92.5% accuracy rate average. Additionally, test it 2,900 obfuscated classify with 80.4%.

参考文章(40)
Radim Řehůřek, Petr Sojka, Software Framework for Topic Modelling with Large Corpora University of Malta. ,(2010)
Steve Easterbrook, Janice Singer, Margaret-Anne Storey, Daniela Damian, Selecting Empirical Methods for Software Engineering Research Guide to Advanced Empirical Software Engineering. pp. 285- 311 ,(2008) , 10.1007/978-1-84800-044-5_11
Rachel Greenstadt, Richard Harang, Clare Voss, Arvind Narayanan, Fabian Yamaguchi, Aylin Caliskan-Islam, Andrew Liu, De-anonymizing programmers via code stylometry usenix security symposium. pp. 255- 270 ,(2015)
Thomas H. Cormen, Ronald L. Rivest, Charles E. Leiserson, Clifford Stein, Introduction to Algorithms, third edition ,(2009)
Stefanos Gritzalis, Georgia Frantzeskou, Efstathios Stamatatos, Blake Stephen Howald, Carole E. Chaski, Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method. International Journal of Digital Evidence. ,vol. 6, ,(2007)
Massimiliano Pontil, Stefano Rogai, Alessandro Verri, Recognizing 3-D Objects with Linear Support Vector Machines european conference on computer vision. pp. 469- 483 ,(1998) , 10.1007/BFB0054759
Anita Prinzie, Dirk Van den Poel, Random multiclass classification: generalizing random forests to random MNL and random NB database and expert systems applications. ,vol. 4653, pp. 349- 358 ,(2007) , 10.1007/978-3-540-74469-6_35
Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, Sokratis Katsikas, Source Code Author Identification Based on N-gram Author Profiles artificial intelligence applications and innovations. pp. 508- 515 ,(2006) , 10.1007/0-387-34224-9_59
Kevin Allix, Quentin Jerome, Tegawende F. Bissyande, Jacques Klein, Radu State, Yves Le Traon, A Forensic Analysis of Android Malware -- How is Malware Written and How it Could Be Detected? 2014 IEEE 38th Annual Computer Software and Applications Conference. pp. 384- 393 ,(2014) , 10.1109/COMPSAC.2014.61
Wu Zhou, Yajin Zhou, Michael Grace, Xuxian Jiang, Shihong Zou, Fast, scalable detection of "Piggybacked" mobile applications Proceedings of the third ACM conference on Data and application security and privacy - CODASPY '13. pp. 185- 196 ,(2013) , 10.1145/2435349.2435377