作者: Diego Reforgiato Recupero , Salvatore Carta , Luca Piras , Sergio Consoli , Alessandro Sebastian Podda
DOI: 10.7717/PEERJ-CS.438
关键词:
摘要: In the current age of overwhelming information and massive production textual data on Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle problem from different perspectives, including Natural Language Processing Big Data analysis, with goal providing valuable resources support decision-making a wide variety fields. this paper, we propose real-time domain-specific clustering-based event-detection approach that integrates coming, one hand, traditional newswires and, other microblogging platforms. The implemented pipeline is twofold: (i) insights user about relevant events are reported press daily basis; (ii) alerting potentially impactful events, referred as hot for some specific tasks or domains interest. algorithm identifies clusters related news stories published by globally renowned sources, which guarantee authoritative, noise-free affairs; subsequently, content extracted microblogs associated order gain assessment relevance event public opinion. To identify day d create lexicon looking at articles stock previous days up d-1 Although can be extended (e.g. politics, economy, sports), hereby present implementation financial sector. We validated our solution through qualitative quantitative evaluation, performed Dow Jones' Data, News Analytics dataset, stream messages platform Stocktwits, Standard & Poor's 500 index time-series. experiments demonstrate effectiveness proposal extracting meaningful real-world spotting sphere. An added value evaluation given visual inspection selected number significant starting Brexit Referendum reaching until recent outbreak Covid-19 pandemic early 2020.