作者: Hwanjo Yu , Jiawei Han , K.C. Chang
DOI: 10.1109/TKDE.2004.1264823
关键词:
摘要: Web page classification is one of the essential techniques for mining because classifying pages an interesting class often first step Web. However, constructing a classifier requires laborious preprocessing such as collecting positive and negative training examples. For instance, in order to construct "homepage" classifier, needs collect sample homepages (positive examples) nonhomepages (negative examples). In particular, examples arduous work caution avoid bias. The paper presents framework, called example based learning (PEBL), which eliminates need manually preprocessing. PEBL framework applies algorithm, mapping-convergence (M-C), achieve high accuracy (with unlabeled data) that traditional SVM data). M-C runs two stages: mapping stage convergence stage. stage, algorithm uses weak draws initial approximation "strong" data. Based on approximation, iteratively internal (e.g., SVM) maximizes margins progressively improve Thus, boundary eventually converges true feature space. We present with supporting theoretical experimental justifications. Our experiments show that, given same set examples; outperforms one-class SVMs, it almost accurate SVMs.