摘要: Many different topic models have been used successfully for a variety of applications. However, even state-of-the-art suffer from the important flaw that they do not capture tendency words to appear in bursts; it is fundamental property language if word once document, more likely be again. We introduce model uses Dirichlet compound multinomial (DCM) distributions this burstiness phenomenon. On both text and non-text datasets, new achieves better held-out likelihood than standard latent allocation (LDA). It straightforward incorporate DCM extension into are complex LDA.