作者: Pat Langley
DOI: 10.21236/ADA292575
关键词: sort 、 Feature selection 、 Computer science 、 Heuristic 、 Machine learning 、 k-nearest neighbors algorithm 、 Winnow 、 Sample complexity 、 Artificial intelligence
摘要: In this paper, we review the problem of selecting rele- vant features for use in machine learning. We describe terms heuristic search through a space feature sets, and identify four dimensions along which approaches to can vary. consider recent work on selection framework, then close with some challenges future area. 1. The Problem Irrelevant Features accuracy) grow slowly number irrele- attributes. Theoretical results algorithms that restricted hypothesis spaces are encouraging. For instance, worst-case errors made by Littlestone's (1987) WINNOW method grows only logarithmically irrelevant features. Pazzani Sarrett's (1992) average-case analysis WHOLIST, simple conjunctive algorithm, Lang- ley Iba's (1993) treatment naive Bayesian classifier, suggest their sample complexities at most linearly However, theoretical less optimistic induction methods larger concept descriptions. example, Langley nearest neighbor indicates its complexity exponen- tially attributes, even target concepts. Experimental stud- ies consistent conclu- sion, other experiments similar hold explicitly se- lect decision-tree appears irrelevants concepts, but exponentially parity since evaluation metric cannot distinguish relevant from fea- tures latter situation (Langley & Sage, press). Results sort have encouraged learn- ing researchers explore more sophisticated sections fol- low, present general framework task, examples important problem.