作者: Xuanhui Wang , Dou Shen , Hua-Jun Zeng , Zheng Chen , Wei-Ying Ma
关键词: Factor (programming language) 、 Feature vector 、 Cluster analysis 、 Computer science 、 Web page 、 Data mining 、 Information retrieval 、 Representation (mathematics) 、 Automatic summarization 、 Latent semantic analysis 、 HITS algorithm
摘要: Traditional Web page clustering algorithms use the full-text in documents to generate feature vectors. Such methods often produce unsatisfactory results because there is much noisy information, such as decoration, interaction, and advertisement, pages. The varying-length problem of pages also a significant negative factor affecting performance. In this paper, we investigate several summarization techniques tackle these issues when Compared with representation pages, our experimental indicate that proposed approach effectively solves problems information varying-length, thus significantly boosts