摘要: Online social and news media generate rich timely information about real-world events of all kinds. However, the huge amount data available, along with breadth user base, requires a substantial effort filtering to successfully drill down relevant topics events. Trending topic detection is therefore fundamental building block monitor summarize originating from sources. There are wide variety methods variables they greatly affect quality results. We compare six on three Twitter datasets related major events, which differ in their time scale churn rate. observe how nature event considered, volume activity over time, sampling procedure pre-processing detected topics, also depends type method used. find that standard natural language processing techniques can perform well for streams very focused but novel designed mine temporal distribution concepts needed handle more heterogeneous containing multiple stories evolving parallel. One we propose, based -grams cooccurrence ranking, consistently achieves best performance across these conditions, thus being reliable than other state-of-the-art techniques.

参考文章(42)
Jianshu Weng, Bu-Sung Lee, None, Event Detection in Twitter international conference on weblogs and social media. ,(2011)
Hassan Sayyadi, Alexey Maykov, Matthew Hurst, Event Detection and Tracking in Social Streams international conference on weblogs and social media. ,(2009)
Jacob Ratkiewicz, Alessandro Flammini, Mark Meiss, Michael D. Conover, Filippo Menczer Menczer, Bruno Goncalves, Detecting and Tracking Political Abuse in Social Media international conference on weblogs and social media. ,(2011)
Bart Goethals, Frequent Set Mining The Data Mining and Knowledge Discovery Handbook. pp. 321- 338 ,(2005) , 10.1007/978-0-387-09823-4_16
Mor Naaman, Hila Becker, Luis Gravano, Beyond Trending Topics: Real-World Event Identification on Twitter international conference on weblogs and social media. ,(2011) , 10.7916/D81V5NVX
Peter Willett, Karen Sparck Jones, Readings in information retrieval Morgan Kaufmann Publishers Inc.. ,(1997)
Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, A Graph-Based Clustering Scheme for Identifying Related Tags in Folksonomies Data Warehousing and Knowledge Discovery. pp. 65- 76 ,(2010) , 10.1007/978-3-642-15105-7_6
James Allan, None, Topic detection and tracking: event-based information organization Kluwer Academic Publishers. ,(2002)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
David Ahn, Brendan O'Connor, Michel Krieger, TweetMotif: Exploratory Search and Topic Summarization for Twitter international conference on weblogs and social media. ,(2010)