作者: Shivakumar Vaithyanathan , Mayank Prakash , Robert Travis
DOI:
关键词:
摘要: A top-down clustering method and apparatus recursively processes clusters of documents by first extracting features from the comprising cluster, then using extracted to generate sub-clusters finally generated develop topics identifiers for each sub-cluster. This process is repeated cluster sub-cluster in a recursive manner so that performed document perform sub-clustering. Feature extraction frequency counts terms taken discarding falling outside predetermined boundaries computed based on total number cluster. After bounding, tokens reduced prior means correlation technique, such as PCA model.