Explainable Machine Learning Exploiting News and Domain-Specific Lexicon for Stock Market Forecasting

作者: Diego Reforgiato Recupero , Salvatore M. Carta , Luca Piras , Sergio Consoli , Alessandro Sebastian Podda

DOI: 10.1109/ACCESS.2021.3059960

关键词: Stock marketBinary classificationFeature engineeringDecision tree learningIndex (economics)Computer scienceArtificial intelligenceSet (abstract data type)Machine learningFeature extraction

摘要: In this manuscript, we propose a Machine Learning approach to tackle binary classification problem whose goal is predict the magnitude (high or low) of future stock price variations for individual companies S&P 500 index. Sets lexicons are generated from globally published articles with identifying most impactful words on market in specific time interval and within certain business sector. A feature engineering process then performed out lexicons, obtained features fed Decision Tree classifier. The predicted label represents underlying company’s variation next day, being either higher lower than threshold. performance evaluation have carried through walk-forward strategy, against set solid baselines, shows that our clearly outperforms competitors. Moreover, devised Artificial Intelligence (AI) explainable, sense analyze white-box behind classifier provide explanations results.

参考文章(50)
Eric Gilbert, Kyratso George Karahalios, Widespread Worry and the Stock Market international conference on weblogs and social media. pp. 58- 65 ,(2010)
Clifton D. Sutton, Classification and Regression Trees, Bagging, and Boosting Handbook of Statistics. ,vol. 24, pp. 303- 329 ,(2005) , 10.1016/S0169-7161(04)24011-1
J.A.K. Suykens, J. Vandewalle, Least Squares Support Vector Machine Classifiers Neural Processing Letters. ,vol. 9, pp. 293- 300 ,(1999) , 10.1023/A:1018628609742
Xiaodong Li, Haoran Xie, Li Chen, Jianping Wang, Xiaotie Deng, News impact on stock price return via sentiment analysis Knowledge Based Systems. ,vol. 69, pp. 14- 23 ,(2014) , 10.1016/J.KNOSYS.2014.04.022
Binay K. Adhikari, Jimmy E. Hilliard, The VIX, VXO and realised volatility: a test of lagged and contemporaneous relationships International Journal of Financial Markets and Derivatives. ,vol. 3, pp. 222- 240 ,(2014) , 10.1504/IJFMD.2014.059637
Alexander Yates, Michael Cafarella, Michele Banko, Oren Etzioni, Matthew Broadhead, Stephen Soderland, TextRunner: Open Information Extraction on the Web north american chapter of the association for computational linguistics. pp. 25- 26 ,(2007) , 10.3115/1614164.1614177
Robert P. Schumaker, Hsinchun Chen, Textual analysis of stock market prediction using breaking financial news ACM Transactions on Information Systems. ,vol. 27, pp. 1- 19 ,(2009) , 10.1145/1462198.1462204
Eduardo J. Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis, Alejandro Jaimes, Correlating financial time series with micro-blogging activity Proceedings of the fifth ACM international conference on Web search and data mining - WSDM '12. pp. 513- 522 ,(2012) , 10.1145/2124295.2124358
Masoud Makrehchi, Sameena Shah, Wenhui Liao, Stock Prediction Using Event-Based Sentiment Analysis ieee wic acm international conference on intelligent agent technology. ,vol. 1, pp. 337- 342 ,(2013) , 10.1109/WI-IAT.2013.48
Jerome H. Friedman, Stochastic gradient boosting Computational Statistics & Data Analysis. ,vol. 38, pp. 367- 378 ,(2002) , 10.1016/S0167-9473(01)00065-2