Inductive learning algorithms and representations for text categorization

作者: Susan Dumais , John Platt , David Heckerman , Mehran Sahami

DOI: 10.1145/288627.288651

关键词:

摘要: 1. ABSTRACT Text categorization – the assignment of natural language texts to one or more predefined categories based on their content is an important component in many information organization and management tasks. We compare effectiveness five different automatic learning algorithms for text terms speed, realtime classification accuracy. also examine training set size, alternative document representations. Very accurate classifiers can be learned automatically from examples. Linear Support Vector Machines (SVMs) are particularly promising because they very accurate, quick train, evaluate. 1.1

参考文章(28)
V. N. Vapnik, The Nature of Statistical Learning Theory. ,(1995)
Leon Bottou, Leon Bottou, V. Vapnik, Yann Lecun, I. Guyon, Eduard Sackinger, Corinna Cortes, Corinna Cortes, U.A. Muller, Patrice Simard, Patrice Simard, Harris Drucker, Harris Drucker, L.D. Jackel, J. S. Denker, J. S. Denker, Learning algorithms for classification: A comparison on handwritten digit recognition World Scientific. pp. 261- 276 ,(1995)
Mehran Sahami, Learning limited dependence Bayesian classifiers knowledge discovery and data mining. pp. 335- 338 ,(1996)
Norbert Fuhr, Kostas Tzeras, Gerhard Knorz, Stephan Hartmann, Michael Schwantner, Gerhard Lustig, AIR/X - A rule-based multistage indexing system for Iarge subject fields. Intelligent Text and Image Handling. pp. 606- 623 ,(1991)
Robert E. Schapire, Yoav Freund, Wee Sun Lee, Peter Barlett, Boosting the margin: A new explanation for the effectiveness of voting methods international conference on machine learning. pp. 322- 330 ,(1997)
Mehran Sahami, Susan Dumais, Eric Horvitz, David Heckerman, A Bayesian Approach to Filtering Junk E-Mail national conference on artificial intelligence. ,(1998)
David Maxwell Chickering, David Heckerman, Christopher Meek, A Bayesian approach to learning Bayesian networks with local structure uncertainty in artificial intelligence. pp. 80- 89 ,(1997)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
David D. Lewis, Karen Spärck Jones, Natural language processing for information retrieval Communications of the ACM. ,vol. 39, pp. 92- 101 ,(1996) , 10.1145/234173.234210
Robert E. Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee, Boosting the margin: a new explanation for the effectiveness of voting methods Annals of Statistics. ,vol. 26, pp. 1651- 1686 ,(1998) , 10.1214/AOS/1024691352