Research of Automatic Topic Detection Based on Incremental Clustering

作者: Xiao-Ming ZHANG , Zhou-Jun LI , Wen-Han CHAO

DOI: 10.3724/SP.J.1001.2012.04111

关键词:

摘要: With the exponential growth of information on Internet, it has become increasingly difficult to find and organize relevant material. Topic detection tracking (TDT) is a research area addressing this problem. As one basic tasks TDT, topic problem grouping all stories, based topics they discuss. This paper proposes new method (TPIC) an incremental clustering algorithm. The proposed strives achieve high accuracy capability estimating true number in document corpus. Term reweighing algorithm used accurately efficiently cluster given corpus, self-refinement process discriminative feature identification improve performance clustering. Furthermore, topics' "aging" nature precluster Bayesian criterion (BIC) estimate topics. Experimental results linguistic data consortium (LDC) datasets TDT-4 show that model can both efficiency accuracy,

参考文章(12)
Young-Woo Seo, Katia Sycara, Text clustering for topic detection Defense Technical Information Center. ,(2004) , 10.21236/ADA599196
Virach Sornlertlamvanich, Hitoshi Isahara, Canasai Kruengkrai, Refining a divisive partitioning algorithm for unsupervised clustering hybrid intelligent systems. pp. 535- 542 ,(2003)
John Dunnion, Cormac Flynn, Topic Detection in the news domain Proceedings of the 2004 international symposium on Information and communication technologies. pp. 103- 108 ,(2004) , 10.5555/1071509.1071530
Yookyung Jo, Carl Lagoze, C. Lee Giles, Detecting research topics via the correlation between graphs and texts Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07. pp. 370- 379 ,(2007) , 10.1145/1281192.1281234
T. L. Griffiths, M. Steyvers, Finding scientific topics Proceedings of the National Academy of Sciences of the United States of America. ,vol. 101, pp. 5228- 5235 ,(2004) , 10.1073/PNAS.0307752101
Qiaozhu Mei, ChengXiang Zhai, Discovering evolutionary theme patterns from text: an exploration of temporal text mining knowledge discovery and data mining. pp. 198- 207 ,(2005) , 10.1145/1081870.1081895
Y. Yang, J.G. Carbonell, R.D. Brown, T. Pierce, B.T. Archibald, X. Liu, Learning approaches for detecting and tracking news events IEEE Intelligent Systems & Their Applications. ,vol. 14, pp. 32- 43 ,(1999) , 10.1109/5254.784083
M David, J Blei, D Lafferty, Correlated Topic Models neural information processing systems. ,vol. 18, pp. 147- 154 ,(2005)
James Allan, Jaime Carbonell, Jonathan Yamron, Yiming Yang, George Doddington, Topic Detection and Tracking Pilot Study Final Report Proceedings of the Broadcast News Transcription and Understanding Workshop (Sponsored by DARPA). ,(1998) , 10.1184/R1/6626252.V1
E. Erosheva, S. Fienberg, J. Lafferty, Mixed-membership models of scientific publications Proceedings of the National Academy of Sciences of the United States of America. ,vol. 101, pp. 5220- 5227 ,(2004) , 10.1073/PNAS.0307760101