Inferring emerging and evolving topics in streaming text

作者: Saha Ankan , Arindam Banerjee , Shiva P Kasiviswanathan , Richard D Lawrence , Prem Melville

DOI:

关键词:

摘要: (57) ABSTRACT A method, system and computer program product for infer ring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify a first group of topics as evolving topics and a second group of topics as emerging topics. The matrices includes a first matrix X identifying a multitude of words in each of the documents, a second matrix W identifying a multitude of topics in each of the documents, and a third matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, the documents form a streaming dataset, and two forms of tem poral regularizers are used to help identify the evolving topics and the emerging topics in the streaming dataset.

参考文章(0)