作者: G. Salton , C. S. Yang , C. T. Yu
关键词: Information processing 、 Recall 、 Computer science 、 Automatic indexing 、 Word lists by frequency 、 Linear discriminant analysis 、 Information retrieval 、 Term (time) 、 Artificial intelligence 、 Text mining 、 Search engine indexing 、 Content analysis 、 Natural language processing
摘要: A good deal of work has been done over the years in an attempt to use statistical or probabilistic techniques as a basis for automatic indexing and content analysis.(1–10) Unfortunately, many of these methods are lacking in effectiveness, and the more refined procedures are computationally unattractive. A new technique, known as discrimination value analysis, ranks the text words in accordance with how well they are able to discriminate the documents of a collection from each other; that is, the value of a term …