A frequent keyword-set based algorithm for topic modeling and clustering of research papers

作者: Kumar Shubankar , AdityaPratap Singh , Vikram Pudi

DOI: 10.1109/DMO.2011.5976511

关键词:

摘要: In this paper we introduce a novel and efficient approach to detect topics in large corpus of research papers. With rapidly growing size academic literature, the problem topic detection has become very challenging task. We present unique that uses closed frequent keyword-set form topics. Our also provides natural method cluster papers into hierarchical, overlapping clusters using as similarity measure. To rank cluster, devise modified PageRank algorithm assigns an authoritative score each by considering sub-graph which appears. test our algorithms on DBLP dataset experimentally show are fast, effective scalable.

参考文章(23)
Zhou Chong, Lu Yansheng, Zou Lei, Hu Rong, FICW: Frequent itemset based text clustering with window constraint Wuhan University Journal of Natural Sciences. ,vol. 11, pp. 1345- 1351 ,(2006) , 10.1007/BF02829264
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha, Probabilistic models for discovering e-communities Proceedings of the 15th international conference on World Wide Web - WWW '06. pp. 173- 182 ,(2006) , 10.1145/1135777.1135807
Yookyung Jo, Carl Lagoze, C. Lee Giles, Detecting research topics via the correlation between graphs and texts Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07. pp. 370- 379 ,(2007) , 10.1145/1281192.1281234
T. L. Griffiths, M. Steyvers, Finding scientific topics Proceedings of the National Academy of Sciences of the United States of America. ,vol. 101, pp. 5228- 5235 ,(2004) , 10.1073/PNAS.0307752101
Jon Kleinberg, Bursty and hierarchical structure in streams Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02. pp. 91- 101 ,(2002) , 10.1145/775047.775061
Qiaozhu Mei, ChengXiang Zhai, Discovering evolutionary theme patterns from text: an exploration of temporal text mining knowledge discovery and data mining. pp. 198- 207 ,(2005) , 10.1145/1081870.1081895
Sergey Brin, Lawrence Page, The anatomy of a large-scale hypertextual Web search engine the web conference. ,vol. 30, pp. 107- 117 ,(1998) , 10.1016/S0169-7552(98)00110-X
Nicolas Pasquier, Yves Bastide, Rafik Taouil, Lotfi Lakhal, Efficient mining of association rules using closed itemset lattices Information Systems. ,vol. 24, pp. 25- 46 ,(1999) , 10.1016/S0306-4379(99)00003-4