摘要: We present a probabilistic model for document corpus that combines many of the desirable features previous models. The is called "GaP" Gamma-Poisson, distributions first and last random variable. GaP factor model, it gives an approximate factorization document-term matrix into product matrices Λ X. These factors have strictly non-negative terms. generative assigns finite probabilities to documents in corpus. It can be computed with efficient simple EM recurrence. For suitable choice parameters, maximizes independence between factors. So used as independent-component algorithm adapted data. form empirically well analytically motivated. very accurate results (measured via perplexity) retrieval model. projects terms low-dimensional space "themes," models texts "passages" on same theme.