作者: William W. Cohen
DOI:
关键词: Machine learning 、 Small number 、 Mathematics 、 Generalization 、 Artificial intelligence 、 Classifier (UML) 、 Weighting 、 Learning classifier system
摘要: Two methods for learning text classifiers are compared on classification problems that might arise in filtering and filing personM e-mail messages: a "traxiitionM IR" method based TF-IDF weighting, new sets of "keyword-spotting rules" the RIPPER rule algorithm. It is demonstrated both obtain significant generalizations from small number examples; comparable generalization performance this type; axe reasonably efficient, even with fairly large training sets. However, greater comprehensibility rules may be advantageous system allows users to extend or otherwise modify learned classifier.