Empirical study of topic modeling in Twitter

作者: Liangjie Hong , Brian D. Davison

DOI: 10.1145/1964858.1964870

关键词:

摘要: Social networks such as Facebook, LinkedIn, and Twitter have been a crucial source of information for wide spectrum users. In Twitter, popular that is deemed important by the community propagates through network. Studying characteristics content in messages becomes number tasks, breaking news detection, personalized message recommendation, friends sentiment analysis others. While many researchers wish to use standard text mining tools understand on restricted length those prevents them from being employed their full potential.We address problem using topic models micro-blogging environments studying how can be trained dataset. We propose several schemes train model compare quality effectiveness set carefully designed experiments both qualitative quantitative perspectives. show training aggregated we obtain higher learned which results significantly better performance two real-world classification problems. also discuss state-of-the-art Author-Topic fails hierarchical relationships between entities Media.

参考文章(21)
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Wen-Tau Yih, Christopher Meek, Improving similarity measures for short segments of text national conference on artificial intelligence. pp. 1489- 1494 ,(2007)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Andrew McCallum, Xuerui Wang, Natasha Mohanty, Joint Group and Topic Discovery from Relations and Text Statistical Network Analysis: Models, Issues, and New Directions. pp. 28- 44 ,(2006) , 10.1007/978-3-540-73133-7_3
Daniel Ramage, David Hall, Ramesh Nallapati, Christopher D. Manning, Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora empirical methods in natural language processing. pp. 248- 256 ,(2009) , 10.3115/1699510.1699543
Balachander Krishnamurthy, Phillipa Gill, Martin Arlitt, A few chirps about twitter Proceedings of the first workshop on Online social networks - WOSP '08. pp. 19- 24 ,(2008) , 10.1145/1397735.1397741
T. L. Griffiths, M. Steyvers, Finding scientific topics Proceedings of the National Academy of Sciences of the United States of America. ,vol. 101, pp. 5228- 5235 ,(2004) , 10.1073/PNAS.0307752101
Akshay Java, Xiaodan Song, Tim Finin, Belle Tseng, Why we twitter Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis - WebKDD/SNA-KDD '07. pp. 56- 65 ,(2007) , 10.1145/1348549.1348556
Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas Griffiths, Padhraic Smyth, Mark Steyvers, Learning author-topic models from text corpora ACM Transactions on Information Systems. ,vol. 28, pp. 1- 38 ,(2010) , 10.1145/1658377.1658381
Jonathan Chang, Jordan Boyd-Graber, David M. Blei, Connections between the lines Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09. pp. 169- 178 ,(2009) , 10.1145/1557019.1557044