Learning Rules that Classify E-Mail

作者: William W. Cohen

DOI:

关键词: Machine learningSmall numberMathematicsGeneralizationArtificial intelligenceClassifier (UML)WeightingLearning classifier system

摘要: Two methods for learning text classifiers are compared on classification problems that might arise in filtering and filing personM e-mail messages: a "traxiitionM IR" method based TF-IDF weighting, new sets of "keyword-spotting rules" the RIPPER rule algorithm. It is demonstrated both obtain significant generalizations from small number examples; comparable generalization performance this type; axe reasonably efficient, even with fairly large training sets. However, greater comprehensibility rules may be advantageous system allows users to extend or otherwise modify learned classifier.

参考文章(10)
Robert Armstrong, Dayne Freitag, Thorsten Joachims, Tom Mitchell, WebWatcher : A Learning Apprentice for the World Wide Web national conference on artificial intelligence. ,(1995) , 10.21236/ADA640219
Ken Lang, NewsWeeder: Learning to Filter Netnews Machine Learning Proceedings 1995. pp. 331- 339 ,(1995) , 10.1016/B978-1-55860-377-6.50048-7
William W. Cohen, Text Categorization and Relational Learning Machine Learning Proceedings 1995. pp. 124- 132 ,(1995) , 10.1016/B978-1-55860-377-6.50024-4
David D. Lewis, Jason Catlett, Heterogeneous Uncertainty Sampling for Supervised Learning Machine Learning Proceedings 1994. pp. 148- 156 ,(1994) , 10.1016/B978-1-55860-335-6.50026-X
David Dolan Lewis, Representation and Learning in Information Retrieval University of Massachusetts. ,(1991)
William W. Cohen, Fast Effective Rule Induction Machine Learning Proceedings 1995. pp. 115- 123 ,(1995) , 10.1016/B978-1-55860-377-6.50023-2
J.R. Quinlan, Learning Logical Definitions from Relations Machine Learning. ,vol. 5, pp. 239- 266 ,(1990) , 10.1023/A:1022699322624
G. SALTON, Developments in automatic text retrieval. Science. ,vol. 253, pp. 974- 980 ,(1991) , 10.1126/SCIENCE.253.5023.974
Chidanand Apté, Fred Damerau, Sholom M. Weiss, Automated learning of decision rules for text categorization ACM Transactions on Information Systems. ,vol. 12, pp. 233- 251 ,(1994) , 10.1145/183422.183423
J. Ross Quinlan, C4.5: Programs for Machine Learning ,(1992)