作者: George Forman
关键词:
摘要: Good feature selection is essential for text classification to make it tractable machine learning, and improve performance. This study benchmarks the performance of twelve metrics across 229 problems drawn from Reuters, OHSUMED, TREC, etc. using Support Vector Machines. The results are analyzed various objectives. For best accuracy, F-measure or recall, findings reveal an outstanding new metric, "Bi-Normal Separation" (BNS). precision alone, however, Information Gain (IG) was superior. A evaluation methodology offered that focuses on needs data mining practitioner who seeks choose one two try mostly likely have single dataset at hand. analysis determined, example, IG Chi-Squared correlated failures precision, paired with BNS a better choice.