Accounting for burstiness in topic models

作者: Gabriel Doyle , Charles Elkan

DOI: 10.1145/1553374.1553410

关键词:

摘要: Many different topic models have been used successfully for a variety of applications. However, even state-of-the-art suffer from the important flaw that they do not capture tendency words to appear in bursts; it is fundamental property language if word once document, more likely be again. We introduce model uses Dirichlet compound multinomial (DCM) distributions this burstiness phenomenon. On both text and non-text datasets, new achieves better held-out likelihood than standard latent allocation (LDA). It straightforward incorporate DCM extension into are complex LDA.

参考文章(16)
Michael A. Newton, Adrian E. Raftery, Approximate Bayesian Inference with the Weighted Likelihood Bootstrap Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 56, pp. 3- 26 ,(1994) , 10.1111/J.2517-6161.1994.TB01956.X
Edoardo M Airoldi, Eric P Xing, Stephen E Fienberg, Mixed membership analysis of genome-wide expression data arXiv: Quantitative Methods. ,(2007)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
T. L. Griffiths, M. Steyvers, Finding scientific topics Proceedings of the National Academy of Sciences of the United States of America. ,vol. 101, pp. 5228- 5235 ,(2004) , 10.1073/PNAS.0307752101
Ciyou Zhu, Richard H. Byrd, Peihuang Lu, Jorge Nocedal, Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization ACM Transactions on Mathematical Software. ,vol. 23, pp. 550- 560 ,(1997) , 10.1145/279232.279236
Charles Elkan, Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 289- 296 ,(2006) , 10.1145/1143844.1143881
Gilles. Celeux, Didier. Chauveau, Jean. Diebolt, Stochastic versions of the em algorithm: an experimental study in the mixture case Journal of Statistical Computation and Simulation. ,vol. 55, pp. 287- 314 ,(1996) , 10.1080/00949659608811772
Wei Li, Andrew McCallum, Pachinko allocation Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 577- 584 ,(2006) , 10.1145/1143844.1143917
Fei-Fei Li, P. Perona, A Bayesian hierarchical model for learning natural scene categories computer vision and pattern recognition. ,vol. 2, pp. 524- 531 ,(2005) , 10.1109/CVPR.2005.16