Event detection in finance using hierarchical clustering algorithms on news and tweets.

作者: Diego Reforgiato Recupero , Salvatore Carta , Luca Piras , Sergio Consoli , Alessandro Sebastian Podda

DOI: 10.7717/PEERJ-CS.438

关键词:

摘要: In the current age of overwhelming information and massive production textual data on Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle problem from different perspectives, including Natural Language Processing Big Data analysis, with goal providing valuable resources support decision-making a wide variety fields. this paper, we propose real-time domain-specific clustering-based event-detection approach that integrates coming, one hand, traditional newswires and, other microblogging platforms. The implemented pipeline is twofold: (i) insights user about relevant events are reported press daily basis; (ii) alerting potentially impactful events, referred as hot for some specific tasks or domains interest. algorithm identifies clusters related news stories published by globally renowned sources, which guarantee authoritative, noise-free affairs; subsequently, content extracted microblogs associated order gain assessment relevance event public opinion. To identify day d create lexicon looking at articles stock previous days up d-1 Although can be extended (e.g. politics, economy, sports), hereby present implementation financial sector. We validated our solution through qualitative quantitative evaluation, performed Dow Jones' Data, News Analytics dataset, stream messages platform Stocktwits, Standard & Poor's 500 index time-series. experiments demonstrate effectiveness proposal extracting meaningful real-world spotting sphere. An added value evaluation given visual inspection selected number significant starting Brexit Referendum reaching until recent outbreak Covid-19 pandemic early 2020.

参考文章(34)
Sergio Consoli, Kenneth Darby-Dowman, Gijs Geleijnse, Jan Korst, Steffen Pauws, Heuristic Approaches for the Quartet Method of Hierarchical Clustering IEEE Transactions on Knowledge and Data Engineering. ,vol. 22, pp. 1428- 1443 ,(2010) , 10.1109/TKDE.2009.188
Jianqing Fan, Jinchi Lv, Sure independence screening for ultrahigh dimensional feature space Journal of The Royal Statistical Society Series B-statistical Methodology. ,vol. 70, pp. 849- 911 ,(2008) , 10.1111/J.1467-9868.2008.00674.X
Alexander Hogenboom, Frederik Hogenboom, Flavius Frasincar, Kim Schouten, Otto van der Meer, Semantics-based information extraction for detecting economic events Multimedia Tools and Applications. ,vol. 64, pp. 27- 52 ,(2013) , 10.1007/S11042-012-1122-0
G. Salton, A. Wong, C. S. Yang, A vector space model for automatic indexing Communications of the ACM. ,vol. 18, pp. 613- 620 ,(1975) , 10.1145/361219.361220
Geoffrey Hinton, Laurens van der Maaten, Visualizing Data using t-SNE Journal of Machine Learning Research. ,vol. 9, pp. 2579- 2605 ,(2008)
Nicholas Thapen, Donal Simmie, Chris Hankin, The early bird catches the term: combining twitter and news data for event detection and situational awareness. Journal of Biomedical Semantics. ,vol. 7, pp. 61- 61 ,(2016) , 10.1186/S13326-016-0103-Z
Wei Xie, Feida Zhu, Jing Jiang, Ee-Peng Lim, Ke Wang, TopicSketch: Real-Time Bursty Topic Detection from Twitter IEEE Transactions on Knowledge and Data Engineering. ,vol. 28, pp. 2216- 2229 ,(2016) , 10.1109/TKDE.2016.2556661
Mariana Daniel, Rui Ferreira Neves, Nuno Horta, Company event popularity for financial markets using Twitter and sentiment analysis Expert Systems With Applications. ,vol. 71, pp. 111- 124 ,(2017) , 10.1016/J.ESWA.2016.11.022
Mahmud Hasan, Mehmet A Orgun, Rolf Schwitter, A survey on real-time event detection from the Twitter data stream: Journal of Information Science. ,vol. 44, pp. 443- 463 ,(2018) , 10.1177/0165551517698564
Linmei Hu, Bin Zhang, Lei Hou, Juanzi Li, Adaptive online event detection in news streams Knowledge Based Systems. ,vol. 138, pp. 105- 112 ,(2017) , 10.1016/J.KNOSYS.2017.09.039