作者: Lifeng Zhou , Hong Wang , Qingsong Xu
DOI: 10.3390/INFO5040634
关键词: HTML 、 Web page 、 Parameter identification problem 、 Decision tree 、 Semi-supervised learning 、 Computer science 、 Ensemble learning 、 Machine learning 、 Artificial intelligence 、 Data mining 、 Artificial neural network 、 Identification (information)
摘要: To surface the Deep Web, one crucial task is to predict whether a given web page has search interface (searchable HyperText Markup Language (HTML) form) or not. Previous studies have focused on supervised classification with labeled examples. However, data are scarce, hard get and requires tediousmanual work, while unlabeled HTML forms abundant easy obtain. In this research, we consider plausibility of using both train better models identify interfaces more effectively. We present semi-supervised co-training ensemble learning approach neural networks decision trees deal identification problem. show that proposed model outperforms previous methods only data. also adding improves effectiveness model.