A Theory of Term Importance in Automatic Text Analysis

作者: G. Salton , C. S. Yang , C. T. Yu

DOI: 10.1002/ASI.4630260106

关键词: Information processingRecallComputer scienceAutomatic indexingWord lists by frequencyLinear discriminant analysisInformation retrievalTerm (time)Artificial intelligenceText miningSearch engine indexingContent analysisNatural language processing

摘要: A good deal of work has been done over the years in an attempt to use statistical or probabilistic techniques as a basis for automatic indexing and content analysis.(1–10) Unfortunately, many of these methods are lacking in effectiveness, and the more refined procedures are computationally unattractive. A new technique, known as discrimination value analysis, ranks the text words in accordance with how well they are able to discriminate the documents of a collection from each other; that is, the value of a term …

参考文章(11)
G. Salton, M. E. Lesk, Computer Evaluation of Indexing and Text Processing Journal of the ACM. ,vol. 15, pp. 8- 36 ,(1968) , 10.1145/321439.321441
Abraham Bookstein, Don R. Swanson, Probabilistic Models for Automatic Indexing. Journal of the Association for Information Science and Technology. ,vol. 25, pp. 312- 316 ,(1974) , 10.1002/ASI.4630250505
H. P. Luhn, A statistical approach to mechanized encoding and searching of literary information Ibm Journal of Research and Development. ,vol. 1, pp. 309- 317 ,(1957) , 10.1147/RD.14.0309
Shyam Kumar, Semantic Clustering of Index Terms Journal of the ACM. ,vol. 15, pp. 493- 513 ,(1968) , 10.1145/321479.321480
Lauren B. Doyle, Indexing and abstracting by association American Documentation. ,vol. 13, pp. 378- 390 ,(1962) , 10.1002/ASI.5090130404
M. E. Maron, Automatic Indexing: An Experimental Inquiry Journal of the ACM. ,vol. 8, pp. 404- 417 ,(1961) , 10.1145/321075.321084
Fred J. Damerau, An experiment in automatic indexing American Documentation. ,vol. 16, pp. 283- 289 ,(1965) , 10.1002/ASI.5090160403
G. SALTON, C.S. YANG, On the Specification of Term Values in Automatic Indexing Journal of Documentation. ,vol. 29, pp. 351- 372 ,(1973) , 10.1108/EB026562
M. E. Maron, On Relevance, Probabilistic Indexing and Information Retrieval Journal of the ACM. ,vol. 7, pp. 216- 244 ,(1960) , 10.1145/321033.321035
KAREN SPARCK JONES, A statistical interpretation of term specificity and its application in retrieval Journal of Documentation. ,vol. 60, pp. 493- 502 ,(1972) , 10.1108/EB026526