作者: Hongfei Yan , Zhao-Yan Ming , Si Li , Tat-Seng Chua , Xian-Ling Mao
DOI:
关键词: Latent Dirichlet allocation 、 Machine learning 、 Computer science 、 Data mining 、 Cluster analysis 、 Perplexity 、 Artificial intelligence 、 Measure (data warehouse) 、 Topic model 、 Process (engineering)
摘要: Supervised hierarchical topic modeling and unsupervised are usually used to obtain topics, such as hLLDA hLDA. makes heavy use of the information from observed labels, but cannot explore new topics; while is able detect automatically topics in data space, does not make any labels. In this paper, we propose a semi-supervised model which aims space incorporating labels into process, called Semi-Supervised Hierarchical Latent Dirichlet Allocation (SSHLDA). We also prove that hLDA special cases SSHLDA. conduct experiments on Yahoo! Answers ODP datasets, assess performance terms perplexity clustering. The experimental results show predictive ability SSHLDA better than baselines, can achieve significant improvement over baselines for clustering FScore measure.