作者: Guixian Xu , Ziheng Yu , Changzhi Wang , Antai Wang
DOI: 10.1007/S00521-018-3744-2
关键词:
摘要: With the development of information technology, Web news has become main way dissemination. topic discovery is useful for users to quickly find valuable and its research constantly improved. Traditional based on vector space model, but it defects such as high dimension data sparsity. However, latent semantic analysis can map high-dimensional sparse words k-dimensional improve similarity same by correlation between words. In this paper, studied. First, set text vectored weight each feature in texts calculated improved TFIDF. After original analysed analysis, relation fully exploited words, topics are extracted clustering approach. For extraction sub-topics, co-occurrence used display sub-topics. essence, sub-topic established through these The experimental results show that proposed method effectively capture current hot related It meaningful technology retrieval mining.