Automatic Categorization Tool for Open Software Repositories

作者: Makoto Matsushita , Shinji Kawaguchi , Katsuro Inoue , Pankaj K. Garg

DOI:

关键词: Software analyticsSoftware qualitySoftware engineeringSoftware developmentSoftware constructionPackage development processSocial software engineeringSoftware frameworkSoftware systemComputer science

摘要: The world of Open Source software has demonstrated the remarkable appeal communal development. Large number projects can leverage, reuse, and coordinate their work through Internet web-based technology. For example, SourceForge currently hosts about sixty thousand systems. Similar strategies have been suggested for corporate development, notions like Corporate Progressive [6, 7] When used in a setting, infrastructures project information sharing present new opportunities. one would to know all that something common, so groups collaborate share work. With thousands projects, manually locating related be difficult. Hence, we propose use automatic categorization find clusters using only source code from projects. Our experiments with small set C programs demonstrates potential systems without human aid.

参考文章(13)
Susan T. Dumais, Thomas Landauer, Latent semantic analysis and the measurement of knowledge ,(1994)
S. Kawaguchi, P.K. Garg, M. Matsushita, K. Inoue, Z. Source, Automatic categorization algorithm for evolvable software archive international workshop on principles of software evolution. pp. 195- 200 ,(2003) , 10.1109/IWPSE.2003.1231227
Robert W. Schwanke, An intelligent tool for re-engineering software modularity international conference on software engineering. pp. 83- 92 ,(1991) , 10.5555/256664.256688
S.C. Choi, W. Scacchi, Extracting and restructuring the design of large systems IEEE Software. ,vol. 7, pp. 66- 71 ,(1990) , 10.1109/52.43051
J.I. Maletic, A. Marcus, Using latent semantic analysis to identify similarities in source code to support program understanding conference on tools with artificial intelligence. pp. 46- 53 ,(2000) , 10.1109/TAI.2000.889845
Jamie Dinkelacker, Dean Nelson, Rob Miller, Pankaj K. Garg, Progressive open source international conference on software engineering. pp. 177- 184 ,(2002) , 10.1145/581339.581363
Timothy Lethbridge, Nicolas Anquetil, Extracting concepts from file names: a new file clustering criterion international conference on software engineering. pp. 84- 93 ,(1998) , 10.5555/302163.302172
Y.S. Maarek, D.M. Berry, G.E. Kaiser, An information retrieval approach for automatically constructing software libraries IEEE Transactions on Software Engineering. ,vol. 17, pp. 800- 813 ,(1991) , 10.1109/32.83915
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman, Indexing by Latent Semantic Analysis Journal of the Association for Information Science and Technology. ,vol. 41, pp. 391- 407 ,(1990) , 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
W.B. Frakes, T.P. Pole, An empirical study of representation methods for reusable software components european software engineering conference. ,vol. 20, pp. 617- 630 ,(1994) , 10.1109/32.310671