Streaming First Story Detection with application to Twitter

作者: Miles Osborne , Saša Petrović , Victor Lavrenko

DOI:

关键词: Data miningTask (project management)Information retrievalPopularityHash functionSocial mediaEvent (computing)Computer science

摘要: With the recent rise in popularity and size of social media, there is a growing need for systems that can extract useful information from this amount data. We address problem detecting new events stream Twitter posts. To make event detection feasible on web-scale corpora, we present an algorithm based locality-sensitive hashing which able overcome limitations traditional approaches, while maintaining competitive results. In particular, comparison with state-of-the-art system first story task shows achieve over order magnitude speedup processing time, retaining comparable performance. Event experiments collection 160 million posts show celebrity deaths are fastest spreading news Twitter.

参考文章(18)
Lise Getoor, Barna Saha, On Maximum Coverage in the Streaming Model & Application to Multi-topic Blog-Watch. siam international conference on data mining. pp. 697- 708 ,(2009)
Natalie S. Glance, Matthew Hurst, Takashi Tomokiyo, BlogPulse: Automated Trend Discovery for Weblogs ,(2003)
Nick Koudas, Nilesh Bansal, BlogScope: a system for online analysis of high volume text streams very large data bases. pp. 1410- 1413 ,(2007)
James Allan, None, Topic detection and tracking: event-based information organization Kluwer Academic Publishers. ,(2002)
Daniel Gruhl, R. Guha, Ravi Kumar, Jasmine Novak, Andrew Tomkins, The predictive power of online chatter knowledge discovery and data mining. pp. 78- 87 ,(2005) , 10.1145/1081870.1081883
Yiming Yang, Tom Pierce, Jaime Carbonell, A study of retrospective and on-line event detection international acm sigir conference on research and development in information retrieval. pp. 28- 36 ,(1998) , 10.1145/290941.290953
Balachander Krishnamurthy, Phillipa Gill, Martin Arlitt, A few chirps about twitter Proceedings of the first workshop on Online social networks - WOSP '08. pp. 19- 24 ,(2008) , 10.1145/1397735.1397741
Moses S. Charikar, Similarity estimation techniques from rounding algorithms symposium on the theory of computing. pp. 380- 388 ,(2002) , 10.1145/509907.509965
Akshay Java, Xiaodan Song, Tim Finin, Belle Tseng, Why we twitter Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis - WebKDD/SNA-KDD '07. pp. 56- 65 ,(2007) , 10.1145/1348549.1348556