Cyber-guided Deep Neural Network for Malicious Repository Detection in GitHub

作者: Yiming Zhang , Yujie Fan , Shifu Hou , Yanfang Ye , Xusheng Xiao

DOI: 10.1109/ICBK50248.2020.00071

关键词:

摘要: As the largest source code repository, GitHub has played a vital role in modern social coding ecosystem to generate production software. Despite apparent benefits of such paradigm, its potential security risks have been largely overlooked (e.g., malicious codes or repositories could be easily embedded and distributed). To address this imminent issue, paper, we propose novel framework (named GitCyber) automate repository detection at first attempt. In GitCyber, extract contents from hosted as inputs for deep neural network (DNN), then incorporate cybersecurity domain knowledge modeled by heterogeneous information (HIN) design cyber-guided loss function learning objective DNN assure classification performance while preserving consistency with observational knowledge. Comprehensive experiments based on large-scale data collected demonstrate that our proposed GitCyber outperforms state-of-the-arts detection.

参考文章(33)
Andrei Venzhega, Polina Zhinalieva, Nikolay Suboch, Graph-based malware distributors detection the web conference. pp. 1141- 1144 ,(2013) , 10.1145/2487788.2488136
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu, PathSim Proceedings of the VLDB Endowment. ,vol. 4, pp. 992- 1003 ,(2011) , 10.14778/3402707.3402736
Yanfang Ye, Tao Li, Shenghuo Zhu, Weiwei Zhuang, Egemen Tas, Umesh Gupta, Melih Abdulhayoglu, Combining file content and file relations for cloud based malware detection Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. pp. 222- 230 ,(2011) , 10.1145/2020408.2020448
Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han, Co-author Relationship Prediction in Heterogeneous Bibliographic Networks advances in social networks analysis and mining. pp. 121- 128 ,(2011) , 10.1109/ASONAM.2011.112
Ferdian Thung, Tegawende F Bissyande, David Lo, Lingxiao Jiang, None, Network Structure of Social Coding in GitHub conference on software maintenance and reengineering. pp. 323- 326 ,(2013) , 10.1109/CSMR.2013.41
Daniel Pletea, Bogdan Vasilescu, Alexander Serebrenik, Security and emotion: sentiment analysis of security discussions on GitHub Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014. pp. 348- 351 ,(2014) , 10.1145/2597073.2597117
Kelly Blincoe, Jyoti Sheoran, Sean Goggins, Eva Petakovic, Daniela Damian, Understanding the popular users Information & Software Technology. ,vol. 70, pp. 30- 39 ,(2016) , 10.1016/J.INFSOF.2015.10.002
Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, Xiang Li, None, Meta Structure: Computing Relevance in Large Heterogeneous Information Networks knowledge discovery and data mining. pp. 1595- 1604 ,(2016) , 10.1145/2939672.2939815
Carey Nachenberg, Christos Faloutsos, Duen Horng, Jeffrey Wilhelm, “Polo” Chau, Polonium: Tera-Scale Graph Mining and Inference for Malware Detection ,(2011)
Bogdan Vasilescu, Vladimir Filkov, Alexander Serebrenik, StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge international conference on social computing. pp. 188- 195 ,(2013) , 10.1109/SOCIALCOM.2013.35