Infinite Author Topic Model based on Mixed Gamma-Negative Binomial Process

作者: Xiangfeng Luo , Junyu Xuan , Jie Lu , Guangquan Zhang , Richard Yi Da Xu

DOI:

关键词:

摘要: Incorporating the side information of text corpus, i.e., authors, time stamps, and emotional tags, into traditional mining models has gained significant interests in area retrieval, statistical natural language processing, machine learning. One branch these works is so-called Author Topic Model (ATM), which incorporates authors's as classical topic model. However, existing ATM needs to predefine number topics, difficult inappropriate many real-world settings. In this paper, we propose an Infinite (IAT) model resolve issue. Instead assigning a discrete probability on fixed use stochastic process determine topics from data itself. To be specific, extend gamma-negative binomial three levels order capture author-document-keyword hierarchical structure. Furthermore, each document assigned mixed gamma that accounts for multi-author's contribution towards document. An efficient Gibbs sampling inference algorithm with conditional distribution being closed-form developed IAT Experiments several datasets show capabilities our learn hidden authors' simultaneously.

参考文章(24)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Wray L. Buntine, Swapnil Mishra, Experiments with non-parametric topic models knowledge discovery and data mining. pp. 881- 890 ,(2014) , 10.1145/2623330.2623691
Cam-Tu Nguyen, Natsuda Kaothanthong, Takeshi Tokuyama, Xuan-Hieu Phan, A feature-word-topic model for image annotation and retrieval ACM Transactions on The Web. ,vol. 7, pp. 12- ,(2013) , 10.1145/2516633.2516634
Alberto Pinto, Goffredo Haus, A novel XML music information retrieval method using graph invariants ACM Transactions on Information Systems. ,vol. 25, pp. 19- ,(2007) , 10.1145/1281485.1281490
Jianyu Zhao, Peng Wang, Kai Huang, A semi-supervised approach for author disambiguation in KDD CUP 2013 knowledge discovery and data mining. pp. 10- ,(2013) , 10.1145/2517288.2517298
Tamara Broderick, Lester Mackey, John Paisley, Michael I. Jordan, Combinatorial Clustering and the Beta Negative Binomial Process IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 37, pp. 290- 306 ,(2015) , 10.1109/TPAMI.2014.2318721
Xin Chen, Xiaohua Hu, T. Y. Lim, Xiajiong Shen, E. K. Park, G. L. Rosen, Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic Topic Modeling IEEE/ACM Transactions on Computational Biology and Bioinformatics. ,vol. 9, pp. 980- 991 ,(2012) , 10.1109/TCBB.2011.113
Hyungsul Kim, Yizhou Sun, Julia Hockenmaier, Jiawei Han, ETM: Entity Topic Models for Mining Documents Associated with Entities 2012 IEEE 12th International Conference on Data Mining. pp. 349- 358 ,(2012) , 10.1109/ICDM.2012.107
Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas Griffiths, Padhraic Smyth, Mark Steyvers, Learning author-topic models from text corpora ACM Transactions on Information Systems. ,vol. 28, pp. 1- 38 ,(2010) , 10.1145/1658377.1658381
Radford M. Neal, Markov Chain Sampling Methods for Dirichlet Process Mixture Models Journal of Computational and Graphical Statistics. ,vol. 9, pp. 249- 265 ,(2000) , 10.1080/10618600.2000.10474879