作者: Eun-Joo Lee , Woo-Sung Jung
DOI: 10.9716/KITS.2012.11.4.263
关键词: Token frequency 、 Similarity (network science) 、 Web page 、 Web application 、 Computer science 、 Data mining
摘要: Abstract 논문투고일:2012년 07월 27일 논문수정완료일:2012년 09월 14일 논문게재확정일:2012년 10월 03일* 이 논문은 2012학년도 충북대학교 학술연구지원사업의 연구비 지원, 그리고 경북대학교 학술연구비에 의하여 연구되었음.** IT대학 컴퓨터학부*** 전자정보대학 컴퓨터공학과, 교신저자It is becoming hard to maintain web applications because of hig h complexity and duplication pages. However, most research about code clone focusing on hunks, an d their target limited a specific language. Thus, we propose GSIM, language-independent statistical approach detect similar pages based scarcity frequency customized tokens. The tokens, which can be obtained from pa ges splitted by set given separators, are defined as atomic elements for calculating similarity between two . In this paper, the domain definition algorithms collecting making matrics, given. We also conducted experiments open source codes evaluation, with our GSIM tool. results show applicability proposed method effects parameters such threshold, toughne ss, length quality performance.