Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora

作者: Daniel Ramage , David Hall , Ramesh Nallapati , Christopher D. Manning

DOI: 10.3115/1699510.1699543

关键词:

摘要: A significant portion of the world's text is tagged by readers on social bookmarking websites. Credit attribution an inherent problem in these corpora because most pages have multiple tags, but tags do not always apply with equal specificity across whole document. Solving credit requires associating each word a document appropriate and vice versa. This paper introduces Labeled LDA, topic model that constrains Latent Dirichlet Allocation defining one-to-one correspondence between LDA's latent topics user tags. allows LDA to directly learn word-tag correspondences. We demonstrate improved expressiveness over traditional visualizations corpus web from del.icio.us. outperforms SVMs more than 3 1 when extracting tag-specific snippets. As multi-label classifier, our competitive discriminative baseline variety datasets.

参考文章(14)
Kamal Nigam, Andrew McCallum, A comparison of event models for naive bayes text classification national conference on artificial intelligence. pp. 41- 48 ,(1998)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
T. L. Griffiths, M. Steyvers, Finding scientific topics Proceedings of the National Academy of Sciences of the United States of America. ,vol. 101, pp. 5228- 5235 ,(2004) , 10.1073/PNAS.0307752101
Daniel Ramage, Paul Heymann, Christopher D. Manning, Hector Garcia-Molina, Clustering the tagged web Proceedings of the Second ACM International Conference on Web Search and Data Mining - WSDM '09. pp. 54- 63 ,(2009) , 10.1145/1498759.1498809
Shuiwang Ji, Lei Tang, Shipeng Yu, Jieping Ye, Extracting shared subspace for multi-label classification Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 381- 389 ,(2008) , 10.1145/1401890.1401939
Jon D. Mcauliffe, David M. Blei, Supervised Topic Models neural information processing systems. ,vol. 20, pp. 121- 128 ,(2007)
Wei Li, Andrew McCallum, Pachinko allocation Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 577- 584 ,(2006) , 10.1145/1143844.1143917
M David, J Blei, D Lafferty, Correlated Topic Models neural information processing systems. ,vol. 18, pp. 147- 154 ,(2005)
Qiaozhu Mei, Xuehua Shen, ChengXiang Zhai, Automatic labeling of multinomial topic models Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07. pp. 490- 499 ,(2007) , 10.1145/1281192.1281246
Simon Lacoste-Julien, Fei Sha, Michael Jordan, None, DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification neural information processing systems. ,vol. 21, pp. 897- 904 ,(2008)