作者: Wei Wang , Guozhu Meng , Haoyu Wang , Kai Chen , Weimin Ge
DOI: 10.1109/ICSME46990.2020.00064
关键词:
摘要: Authorship identification is the process of identifying and classifying authors through given codes. can be used in a wide range software domains, e.g., code authorship disputes, plagiarism detection, exposure attackers’ identity. Besides inherent challenges from legacy development, framework programming crowdsourcing mode Android raise difficulties significantly. More specifically, widespread third party libraries inherited components (e.g., classes, methods, variables) dilute primary within entire app blur boundaries written by different authors. However, prior research has not well addressed these challenges.To this end, we design two-phased approach to attribute an specific developer. In first phase, put forward three types strategies identify relationships between Java packages app, which consist context, semantic structural relationships. A package aggregation algorithm developed cluster all that are high probability same second develop features capture authors’ coding habits stylometry. Based on that, generate fingerprints for author its apps employ several machine learning algorithms classification. We evaluate our datasets contain 15,666 257 distinct developers achieve 92.5% accuracy rate average. Additionally, test it 2,900 obfuscated classify with 80.4%.