Emerging topic detection using dictionary learning

作者: Shiva Prasad Kasiviswanathan , Prem Melville , Arindam Banerjee , Vikas Sindhwani

DOI: 10.1145/2063576.2063686

关键词:

摘要: Streaming user-generated content in the form of blogs, microblogs, forums, and multimedia sharing sites, provides a rich source data from which invaluable information insights maybe gleaned. Given vast volume such social media being continually generated, one challenges is to automatically tease apart emerging topics discussion constant background chatter. Such can be identified by appearance multiple posts on unique subject matter, distinct previous online discourse. We address problem identifying through use dictionary learning. propose two stage approach respectively based detection clustering novel content. derive scalable using alternating directions method solve resulting optimization problems. Empirical results show that our proposed more effective than several baselines detecting traditional news story newsgroup data. also demonstrate practical application analysis, study streaming Twitter.

参考文章(36)
Rodolphe Jenatton, Julien Mairal, Guillaume R. Obozinski, Francis R. Bach, Proximal Methods for Sparse Hierarchical Dictionary Learning international conference on machine learning. pp. 487- 494 ,(2010)
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
James Allan, None, Topic detection and tracking: event-based information organization Kluwer Academic Publishers. ,(2002)
Youwei Zhang, Alexandre d’Aspremont, Laurent El Ghaoui, Sparse PCA: Convex Relaxations, Algorithms and Applications arXiv: Optimization and Control. pp. 915- 940 ,(2012) , 10.1007/978-1-4614-0769-0_31
Thomas Hofmann, Probabilistic latent semantic analysis uncertainty in artificial intelligence. ,vol. 15, pp. 289- 296 ,(1999)
Richard Colbaugh, Kristin Glass, Toward Emerging Topic Detection for Business Intelligence: Predictive Analysis of `Meme' Dynamics arXiv: Social and Information Networks. ,(2010)
Inderjit S. Dhillon, Dharmendra S. Modha, Concept Decompositions for Large Sparse Text Data Using Clustering Machine Learning. ,vol. 42, pp. 143- 175 ,(2001) , 10.1023/A:1007612920971
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Daniel D. Lee, H. Sebastian Seung, Learning the parts of objects by non-negative matrix factorization Nature. ,vol. 401, pp. 788- 791 ,(1999) , 10.1038/44565
Patrick L. Combettes, Jean-Christophe Pesquet, Proximal Splitting Methods in Signal Processing Fixed-point algorithms for inverse problems in science and engineering, 2011, ISBN 978-1-4419-9568-1, págs. 185-212. pp. 185- 212 ,(2011) , 10.1007/978-1-4419-9569-8_10