Topic models conditioned on arbitrary features with Dirichlet-multinomial regression

作者: David Mimno , Andrew McCallum

DOI:

关键词:

摘要: Although fully generative models have been successfully used to model the contents of text documents, they are often awkward apply combinations data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic that includes log-linear prior on document-topic distributions is function observed features document, such as author, publication venue, references, dates. We show by selecting appropriate features, DMR can meet or exceed performance several previously published designed for specific data.

参考文章(16)
Richard Lindrooth, Paulo Guimaraes, Dirichlet-Multinomial Regression Research Papers in Economics. ,(2005)
Andrés Corrada-Emmanuel, Andrew McCallum, Xuerui Wang, Topic and role discovery in social networks international joint conference on artificial intelligence. pp. 786- 791 ,(2005)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Mark D. Smucker, James Allan, Ben Carterette, A comparison of statistical significance tests for information retrieval evaluation conference on information and knowledge management. pp. 623- 632 ,(2007) , 10.1145/1321440.1321528
David M Blei, Michael I Jordan, None, Modeling annotated data international acm sigir conference on research and development in information retrieval. pp. 127- 134 ,(2003) , 10.1145/860435.860460
Dong C. Liu, Jorge Nocedal, On the limited memory BFGS method for large scale optimization Mathematical Programming. ,vol. 45, pp. 503- 528 ,(1989) , 10.1007/BF01589116
Laura Dietz, Steffen Bickel, Tobias Scheffer, Unsupervised prediction of citation influences international conference on machine learning. pp. 233- 240 ,(2007) , 10.1145/1273496.1273526
Jon D. Mcauliffe, David M. Blei, Supervised Topic Models neural information processing systems. ,vol. 20, pp. 121- 128 ,(2007)
David Newman, Chaitanya Chemudugunta, Padhraic Smyth, Statistical entity-topic models knowledge discovery and data mining. pp. 680- 686 ,(2006) , 10.1145/1150402.1150487
Hanna M. Wallach, Topic modeling Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 977- 984 ,(2006) , 10.1145/1143844.1143967