作者: Xiao-Ming ZHANG , Zhou-Jun LI , Wen-Han CHAO
DOI: 10.3724/SP.J.1001.2012.04111
关键词:
摘要: With the exponential growth of information on Internet, it has become increasingly difficult to find and organize relevant material. Topic detection tracking (TDT) is a research area addressing this problem. As one basic tasks TDT, topic problem grouping all stories, based topics they discuss. This paper proposes new method (TPIC) an incremental clustering algorithm. The proposed strives achieve high accuracy capability estimating true number in document corpus. Term reweighing algorithm used accurately efficiently cluster given corpus, self-refinement process discriminative feature identification improve performance clustering. Furthermore, topics' "aging" nature precluster Bayesian criterion (BIC) estimate topics. Experimental results linguistic data consortium (LDC) datasets TDT-4 show that model can both efficiency accuracy,