SSHLDA: A Semi-Supervised Hierarchical Topic Model

作者: Hongfei Yan , Zhao-Yan Ming , Si Li , Tat-Seng Chua , Xian-Ling Mao

DOI:

关键词: Latent Dirichlet allocationMachine learningComputer scienceData miningCluster analysisPerplexityArtificial intelligenceMeasure (data warehouse)Topic modelProcess (engineering)

摘要: Supervised hierarchical topic modeling and unsupervised are usually used to obtain topics, such as hLLDA hLDA. makes heavy use of the information from observed labels, but cannot explore new topics; while is able detect automatically topics in data space, does not make any labels. In this paper, we propose a semi-supervised model which aims space incorporating labels into process, called Semi-Supervised Hierarchical Latent Dirichlet Allocation (SSHLDA). We also prove that hLDA special cases SSHLDA. conduct experiments on Yahoo! Answers ODP datasets, assess performance terms perplexity clustering. The experimental results show predictive ability SSHLDA better than baselines, can achieve significant improvement over baselines for clustering FScore measure.

参考文章(23)
Yves Petinot, Kathleen McKeown, Kapil Thadani, A Hierarchical Model of Web Summaries meeting of the association for computational linguistics. pp. 670- 675 ,(2011) , 10.7916/D8959RWQ
Wei Li, Andrew McCallum, Pachinko allocation Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 577- 584 ,(2006) , 10.1145/1143844.1143917
M David, J Blei, D Lafferty, Correlated Topic Models neural information processing systems. ,vol. 18, pp. 147- 154 ,(2005)
David Mimno, Wei Li, Andrew McCallum, Mixtures of hierarchical topics with Pachinko allocation international conference on machine learning. pp. 633- 640 ,(2007) , 10.1145/1273496.1273576
Simon Lacoste-Julien, Fei Sha, Michael Jordan, None, DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification neural information processing systems. ,vol. 21, pp. 897- 904 ,(2008)
Thomas Griffiths, Michael Jordan, Joshua Tenenbaum, David Blei, None, Hierarchical Topic Models and the Nested Chinese Restaurant Process neural information processing systems. ,vol. 16, pp. 17- 24 ,(2003)
Mark Steyvers, Michal Rosen-Zvi, Thomas Griffiths, Padhraic Smyth, The author-topic model for authors and documents uncertainty in artificial intelligence. pp. 487- 494 ,(2004) , 10.5555/1036843.1036902
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman, Indexing by Latent Semantic Analysis Journal of the Association for Information Science and Technology. ,vol. 41, pp. 391- 407 ,(1990) , 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Daniel Ramage, Christopher D. Manning, Susan Dumais, Partially labeled topic models for interpretable text mining Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. pp. 457- 465 ,(2011) , 10.1145/2020408.2020481
Chaitanya Chemudugunta, America Holloway, Padhraic Smyth, Mark Steyvers, Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning international semantic web conference. pp. 229- 244 ,(2008) , 10.1007/978-3-540-88564-1_15