作者: Daniel Ramage , David Hall , Ramesh Nallapati , Christopher D. Manning
关键词:
摘要: A significant portion of the world's text is tagged by readers on social bookmarking websites. Credit attribution an inherent problem in these corpora because most pages have multiple tags, but tags do not always apply with equal specificity across whole document. Solving credit requires associating each word a document appropriate and vice versa. This paper introduces Labeled LDA, topic model that constrains Latent Dirichlet Allocation defining one-to-one correspondence between LDA's latent topics user tags. allows LDA to directly learn word-tag correspondences. We demonstrate improved expressiveness over traditional visualizations corpus web from del.icio.us. outperforms SVMs more than 3 1 when extracting tag-specific snippets. As multi-label classifier, our competitive discriminative baseline variety datasets.