GaP

作者: John Canny

DOI: 10.1145/1008992.1009016

关键词:

摘要: We present a probabilistic model for document corpus that combines many of the desirable features previous models. The is called "GaP" Gamma-Poisson, distributions first and last random variable. GaP factor model, it gives an approximate factorization document-term matrix into product matrices Λ X. These factors have strictly non-negative terms. generative assigns finite probabilities to documents in corpus. It can be computed with efficient simple EM recurrence. For suitable choice parameters, maximizes independence between factors. So used as independent-component algorithm adapted data. form empirically well analytically motivated. very accurate results (measured via perplexity) retrieval model. projects terms low-dimensional space "themes," models texts "passages" on same theme.

参考文章(11)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Wei Xu, Xin Liu, Yihong Gong, Document clustering based on non-negative matrix factorization international acm sigir conference on research and development in information retrieval. pp. 267- 273 ,(2003) , 10.1145/860435.860485
Darrell Laham, Thomas K. Landauer, Peter W. Foltz, Learning Human-like Knowledge by Singular Value Decomposition: A Progress Report neural information processing systems. ,vol. 10, pp. 45- 51 ,(1997)
Thomas Hofmann, Probabilistic latent semantic indexing international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 50- 57 ,(1999) , 10.1145/3130348.3130370
A. Hyvärinen, E. Oja, Independent component analysis: algorithms and applications Neural Networks. ,vol. 13, pp. 411- 430 ,(2000) , 10.1016/S0893-6080(00)00026-5
H. Sebastian Seung, Daniel D. Lee, Algorithms for Non-negative Matrix Factorization neural information processing systems. ,vol. 13, pp. 556- 562 ,(2000)
Chengxiang Zhai, John Lafferty, A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 334- 342 ,(2001) , 10.1145/3130348.3130377
A. Hyvarinen, Fast and robust fixed-point algorithms for independent component analysis IEEE Transactions on Neural Networks. ,vol. 10, pp. 626- 634 ,(1999) , 10.1109/72.761722
Thomas K Landauer, Peter W. Foltz, Darrell Laham, An introduction to latent semantic analysis Discourse Processes. ,vol. 25, pp. 259- 284 ,(1998) , 10.1080/01638539809545028