作者: Ron Kohavi
DOI:
关键词:
摘要: In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding algorithms. The investigate are: accuracy estimation, feature subset selection, parameter tuning. latter are related studied under the wrapper approach. decision tables a default majority rule (DTMs) oblivious read-once graphs (OODGs). For cross-validation the~.632 bootstrap. We show examples where they fail conduct large scale comparing them. conclude that repeated runs of five-fold give good tradeoff between bias variance for problem model selection used later chapters. define approach use it relate definitions relevancy to set optimal features, which is defined respect both concept an induction algorithm. requires search space, operators, engine, evaluation function. all them detail introduce compound operators selection. Finally, abstract into probabilistic estimates. test conjecture very powerful bias. induced DTMs surprisingly powerful, concluded extremely important many real-world datasets. resulting small can be succinctly displayed. properties (OODGs) do not suffer from some inherent limitations trees. describe general framework constructing OODGs bottom-up specialize using produced less features than C4.5, state-of-the-art tree algorithm, usually easier humans comprehend.