作者: Triet Huynh Minh Le , David Hin , Roland Croft , M. Ali Babar
关键词:
摘要: Security is an increasing concern in software development. Developer Question and Answer (QA however, the required negative (non-security) class too expensive to obtain. We propose a novel learning framework, PUMiner, automatically mine security posts from Q&A websites. PUMiner builds context-aware embedding model extract features of posts, then develops two-stage PU identify content using labelled Positive Unlabelled posts. evaluate on more than 17.2 million Stack Overflow 52,611 StackExchange. show that effective with validation performance at least 0.85 across all configurations. Moreover, Matthews Correlation Coefficient (MCC) 0.906, 0.534 0.084 points higher one-class SVM, positive-similarity filtering, one-stage models unseen testing respectively. also performs well MCC 0.745 for scenarios where string matching totally fails. Even when ratio positive unlabelled ones only 1:100, still achieves strong 0.65, which 160% better fully-supervised learning. Using we provide largest up-to-date websites practitioners researchers.