Document clustering based on cohesive terms

作者: William S. Spangler

DOI:

关键词:

摘要: A method and a storage medium, that includes instructions for causing computer to implement the method, document categorization is presented. The identifying terms occurring in collection of documents, determining cohesion score each terms. function cosine difference between documents containing term centroid all term. further sorting based on scores. also creating categories scores terms, wherein only (i) selected one (ii) have not already been assigned category. still moving category nearest centroid, thereby refining categories.

参考文章(15)
Generating and Browsing Multiple Taxonomies Over a Document Collection Journal of Management Information Systems. ,vol. 19, pp. 191- 212 ,(2003) , 10.1080/07421222.2003.11045749
Christopher Fox, Lexical analysis and stoplists Information Retrieval. pp. 102- 130 ,(1992)
Jeffrey Thomas Kreulen, William Scott Spangler, Justin Thomas Lessler, Michael Ponce Sanchez, Method for automatically finding frequently asked questions in a helpdesk data set ,(2001)
A. Honrado, R. Leon, R. O'Donnel, D. Sinclair, A word stemming algorithm for the Spanish language string processing and information retrieval. pp. 139- 145 ,(2000) , 10.1109/SPIRE.2000.878189
Kevin L. Markey, Edward A. Green, Ramon Krosley, Method and apparatus for normalizing and converting structured content ,(2001)
Jeffrey Thomas Kreulen, William Scott Spangler, Michael A. Lamb, Method and apparatus for discovering knowledge gaps between problems and solutions in text databases ,(2001)
Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0
Scott Spangler, Jeffrey Kreulen, Interactive methods for taxonomy editing and validation conference on information and knowledge management. pp. 665- 668 ,(2002) , 10.1145/584792.584913
Fazli Can, Esen A. Ozkarahan, Concepts of the cover coefficient-based clustering methodology Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '85. pp. 204- 211 ,(1985) , 10.1145/253495.253526