Empirical Principles and an Industrial Case Study in Retrieving Equivalent Requirements via Natural Language Processing Techniques

作者: Davide Falessi , Giovanni Cantone , Gerardo Canfora

DOI: 10.1109/TSE.2011.122

关键词:

摘要: Though very important in software engineering, linking artifacts of the same type (clone detection) or different types (traceability recovery) is extremely tedious, error-prone, and effort-intensive. Past research focused on supporting analysts with techniques based Natural Language Processing (NLP) to identify candidate links. Because many NLP exist their performance varies according context, it crucial define use reliable evaluation procedures. The aim this paper propose a set seven principles for evaluating identifying equivalent requirements. In paper, we conjecture, verify, that perform given dataset both ability odds requirements correctly. For instance, when are high, then reasonable expect will result good performance. Our key idea measure random factor specific dataset(s) adjust observed accordingly. To support application report practical case study evaluates large number context an Italian company defense aerospace domain. current However, most proposed seem applicable any estimation technique aimed at binary decision (e.g., equivalent/nonequivalent), estimate range [0,1] similarity provided by NLP), used as benchmark (i.e., testbed), independently estimator text) method NLP).

参考文章(105)
Letha H. Etzkorn, Bradley L. Vinz, Comments as a Sublanguage: A Study of Comment Grammar and Purpose. Software Engineering Research and Practice. pp. 17- 23 ,(2008)
D Falessi, L Briand, G Cantone, The impact of automated support for linking equivalent requirements based on similarity measures Simula Research Laboratory Technical Reports. ,(2009)
Klaus Pohl, Frank J. van der Linden, Gnter Bckle, Software Product Line Engineering: Foundations, Principles and Techniques ,(2005)
Giuseppe Pirró, Nuno Seco, Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems. pp. 1271- 1288 ,(2008) , 10.1007/978-3-540-88873-4_25
Jan Bosch, On the Development of Software Product-Family Components Software Product Lines. pp. 146- 164 ,(2004) , 10.1007/978-3-540-28630-1_9
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Lionel C. Briand, Jürgen Wüst, Hakim Lounis, Replicated Case Studies for Investigating Quality Factorsin Object-Oriented Designs Empirical Software Engineering. ,vol. 6, pp. 11- 58 ,(2001) , 10.1023/A:1009815306478