作者: Thomas Hofmann
DOI:
关键词: Unsupervised learning 、 Artificial intelligence 、 Natural language processing 、 Automatic summarization 、 Word (computer architecture) 、 Computer science 、 Abstraction (linguistics) 、 Text mining 、 Latent class model 、 Machine learning 、 Information access
摘要: This paper presents a novel statistical latent class model for text mining and interactive information access. The described learning architecture, called Cluster-Abstraction Model (CAM), is purely data driven utilizes contact-specific word occurrence statistics. In an intertwined fashion, the CAM extracts hierarchical relations between groups of documents as well abstractive organization keywords. An annealed version Expectation-Maximization (EM) algorithm maximum likelihood estimation parameters derived. benefits retrieval automated cluster summarization are investigated experimentally.