作者: Zhongming Han , Qian Mo , Hongzhi Liu , Jianzhi Sun
DOI: 10.1109/ICDIM.2009.5356801
关键词:
摘要: There are a lot of redundant web pages on Internet. Based tag statistic and text similarity comparison, we present novel multilayer framework for detecting duplicated in this paper. We propose two paragraphs detection algorithms implement our framework. The experimental results show that approach achieves high performance, which means can be efficiently detected simply by comparison.