Method and apparatus for automatically generating hierarchical categories from large document collections

作者: Shivakumar Vaithyanathan , Mayank Prakash , Robert Travis

DOI:

关键词:

摘要: A top-down clustering method and apparatus recursively processes clusters of documents by first extracting features from the comprising cluster, then using extracted to generate sub-clusters finally generated develop topics identifiers for each sub-cluster. This process is repeated cluster sub-cluster in a recursive manner so that performed document perform sub-clustering. Feature extraction frequency counts terms taken discarding falling outside predetermined boundaries computed based on total number cluster. After bounding, tokens reduced prior means correlation technique, such as PCA model.

参考文章(8)
Douglass R. Cutting, Jan. O. Pedersen, John W. Tukey, David Karger, Scatter-gather: a cluster-based method and apparatus for browsing large document collections ,(1991)
Frederick S. M. Herz, Jason M. Eisner, Lyle H. Ungar, Mitchell P. Marcus, System for generation of user profiles for a system for customized electronic identification of desirable objects ,(1995)
George R. Doddington, Enrico Bocchieri, Speaker‐independent speech recognition method and system The Journal of the Acoustical Society of America. ,vol. 90, pp. 3392- 3392 ,(1991) , 10.1121/1.401359
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 318- 329 ,(1992) , 10.1145/3130348.3130362
Ajay Mohindra, Murthy Devarakonda, Distributed token management in calypso file system IEEE Symposium on Parallel and Distributed Processing - Proceedings. pp. 290- 297 ,(1994)