Inferring emerging and evolving topics in streaming text

作者: Saha Ankan , Arindam Banerjee , Shiva P Kasiviswanathan , Richard D Lawrence , Prem Melville

DOI:

关键词:

摘要: A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify evolving topics and emerging topics. The matrices includes a matrix X identifying a multitude of words in each of the documents, a matrix W identifying a multitude of topics in each of the documents, and a matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, two forms of temporal regularizers are used to help identify the evolving and emerging topics. In another embodiment, a two stage approach involving detection and clustering is used to help identify the evolving and emerging topics.

参考文章(0)