Recognizing software bug-specific named entity in software bug repository

作者: Cheng Zhou , Bin Li , Xiaobing Sun , Hongjing Guo

DOI: 10.1145/3196321.3196335

关键词:

摘要: Software bug issues are unavoidable in software development and maintenance. In order to manage bugs effectively, tracking systems developed help record, track the of each project. The rich information repository provides possibility establishment entity-centric knowledge bases understand fix bugs. However, existing named entity recognition (NER) deal with text that is structured, formal, well written, a good grammatical structure few spelling errors, which cannot be directly used for bug-specific recognition. For data, they free-form texts, include mixed language studded code, abbreviations software-specific vocabularies. this paper, we summarize characteristics entities, propose classification method build baseline corpus on two open source projects (Mozilla Eclipse). On basis, an approach called BNER Conditional Random Fields (CRF) model word embedding technique. An empirical study conducted evaluate accuracy our technique, results show designed suitable recognition, effective cross-projects NER.

参考文章(48)
Ramin Shokripour, John Anvik, Zarinah M. Kasirun, Sima Zamani, Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation mining software repositories. pp. 2- 11 ,(2013) , 10.1109/MSR.2013.6623997
Andrew McCallum, Wei Li, Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons north american chapter of the association for computational linguistics. pp. 188- 191 ,(2003) , 10.3115/1119176.1119206
Edward Loper, Steven Bird, NLTK Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics -. pp. 69- 72 ,(2002) , 10.3115/1118108.1118117
John D. Lafferty, Andrew McCallum, Fernando C. N. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data international conference on machine learning. pp. 282- 289 ,(2001)
Yefeng Wang, Annotating and Recognising Named Entities in Clinical Notes meeting of the association for computational linguistics. pp. 18- 26 ,(2009) , 10.3115/1667884.1667888
R. Chillarege, I.S. Bhandari, J.K. Chaar, M.J. Halliday, D.S. Moebus, B.K. Ray, M.-Y. Wong, Orthogonal defect classification-a concept for in-process measurements IEEE Transactions on Software Engineering. ,vol. 18, pp. 943- 956 ,(1992) , 10.1109/32.177364
Yoshua Bengio, Joseph Turian, Lev-Arie Ratinov, Word Representations: A Simple and General Method for Semi-Supervised Learning meeting of the association for computational linguistics. pp. 384- 394 ,(2010)
Marco D'Ambros, Michele Lanza, Romain Robbes, An extensive comparison of bug prediction approaches 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). pp. 31- 41 ,(2010) , 10.1109/MSR.2010.5463279
Patrick Pantel, Ariel Fuxman, Jigs and Lures: Associating Web Queries with Structured Entities meeting of the association for computational linguistics. pp. 83- 92 ,(2011)
Laurianne Sitbon, Guido Zuccon, Anthony N. Nguyen, Lance De Vine, Mahnoosh Kholghi, Analysis of Word Embeddings and Sequence Features for Clinical Information Extraction Australasian Language Technology Association Workshop 2015: Proceedings of the Workshop. pp. 21- 30 ,(2015)