作者: Luca Maria Aiello , Georgios Petkos , Carlos Martin , David Corney , Symeon Papadopoulos
关键词:
摘要: Online social and news media generate rich timely information about real-world events of all kinds. However, the huge amount data available, along with breadth user base, requires a substantial effort filtering to successfully drill down relevant topics events. Trending topic detection is therefore fundamental building block monitor summarize originating from sources. There are wide variety methods variables they greatly affect quality results. We compare six on three Twitter datasets related major events, which differ in their time scale churn rate. observe how nature event considered, volume activity over time, sampling procedure pre-processing detected topics, also depends type method used. find that standard natural language processing techniques can perform well for streams very focused but novel designed mine temporal distribution concepts needed handle more heterogeneous containing multiple stories evolving parallel. One we propose, based -grams cooccurrence ranking, consistently achieves best performance across these conditions, thus being reliable than other state-of-the-art techniques.