Modeling Topics Using Statistical Distributions

作者: David L. Marvit , Yannis Labrou , John J. Sidorowich , B. Thomas Adler , Alex Gilman

DOI:

关键词:

摘要: In one embodiment, modeling topics includes accessing a corpus comprising documents that include words. Words of document are selected as keywords the document. The clustered according to yield clusters, where each cluster corresponds topic. A statistical distribution is generated for from words cluster. topic modeled using corresponding

参考文章(17)
John Grothendieck, Allen Louis Gorin, Jeremy Huntley Greet Wright, Apparatus and method for analysis of language model changes ,(2006)
Christopher G. Hill, Shivakumar Vaithyanathan, Mark R. Adler, Computer method and apparatus for clustering documents and automatic generation of cluster keywords ,(1996)
Wendy E. Cowley, Shawn J. Bohn, Manoj Kumar Krishnan, Jarek Nieplocha, Methods and apparatuses for information analysis on shared and distributed computing systems ,(2006)
Xin Liu, Yihong Gong, Wei Xu, Shenghuo Zhu, Document clustering with cluster refinement and model selection capabilities Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02. pp. 191- 198 ,(2002) , 10.1145/564376.564411
Hang Li, Kenji Yamanishi, Topic analysis using a finite mixture model Information Processing and Management. ,vol. 39, pp. 521- 541 ,(2003) , 10.1016/S0306-4573(02)00035-3
S. Momtazi, H. Sameti, M. Bahrani, N. Hafezi, A POS-based fuzzy word clustering algorithm for continuous speech recognition systems information sciences signal processing and their applications. pp. 1- 4 ,(2007) , 10.1109/ISSPA.2007.4555528
M. Tamoto, T. Kawabata, Clustering word category based on binomial posteriori co-occurrence distribution international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 165- 168 ,(1995) , 10.1109/ICASSP.1995.479390
Neelakantan Sundaresan, Jeonghee Yi, Method and system for classifying semi-structured documents ,(2000)