Mining Text and Time Series Data with Applications in Finance

作者: Joe Staines

DOI:

关键词:

摘要: Finance is a field extremely rich in data, and has great need of methods for summarizing understanding these data. Existing multivariate analysis allow the discovery structure time series data but can be difficult to interpret. Often there exists wealth text directly related series. In this thesis it shown that exploited aid interpretation of, even improve, uncovered. To end, two approaches are described tested. Both serve uncover relationship between do so very different ways. The first model comes from topic modelling. A novel developed, closely an existing mixed Improved held-out likelihood demonstrated on corpus UK equity market discovered qualitatively examined. authors’ knowledge attempt combine single generative model. second method simpler, discriminative based low-rank decomposition with constraints determined by word frequencies This compared modelling using both comprising foreign exchange rates describing global macroeconomic sentiments, showing further improvements likelihood. One example application inferred also demonstrated: construction carry trade portfolios. superior results as reminder methodological complexity does not guarantee performance gains.

参考文章(55)
Peter McCullagh, John Ashworth Nelder, Generalized Linear Models ,(1983)
Bert Kappen, Yee Whye Teh, Max Welling, Hybrid variational/gibbs collapsed inference in topic models uncertainty in artificial intelligence. pp. 587- 594 ,(2008)
Erkki Oja, Aapo Hyvarinen, Juha Karhunen, Independent Component Analysis ,(2001)
Christopher M. Bishop, Neural networks for pattern recognition ,(1995)
r;ribeiro-neto bueza-yates (b), Modern Information Retrieval ,(1999)
Pierre Comon, Christian Jutten, Handbook of Blind Source Separation: Independent Component Analysis and Applications Academic Press. pp. 831- ,(2010)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937