Deep Web Search Interface Identification: A Semi-Supervised Ensemble Approach

作者: Lifeng Zhou , Hong Wang , Qingsong Xu

DOI: 10.3390/INFO5040634

关键词: HTMLWeb pageParameter identification problemDecision treeSemi-supervised learningComputer scienceEnsemble learningMachine learningArtificial intelligenceData miningArtificial neural networkIdentification (information)

摘要: To surface the Deep Web, one crucial task is to predict whether a given web page has search interface (searchable HyperText Markup Language (HTML) form) or not. Previous studies have focused on supervised classification with labeled examples. However, data are scarce, hard get and requires tediousmanual work, while unlabeled HTML forms abundant easy obtain. In this research, we consider plausibility of using both train better models identify interfaces more effectively. We present semi-supervised co-training ensemble learning approach neural networks decision trees deal identification problem. show that proposed model outperforms previous methods only data. also adding improves effectiveness model.

参考文章(39)
Semi-Supervised Learning Advanced Methods in Sequence Analysis Lectures. pp. 221- 232 ,(2010) , 10.7551/MITPRESS/9780262033589.001.0001
Denis Shestakov, On building a search interface discovery system Lecture Notes in Computer Science. ,vol. 6162, pp. 81- 93 ,(2009) , 10.1007/978-3-642-14415-8_6
Fabio Roli, Semi-supervised Multiple Classifier Systems: Background and Research Directions Multiple Classifier Systems. ,vol. 3541, pp. 1- 11 ,(2005) , 10.1007/11494683_1
Janez Demšar, Statistical Comparisons of Classifiers over Multiple Data Sets Journal of Machine Learning Research. ,vol. 7, pp. 1- 30 ,(2006)
Ling Lin, Lizhu Zhou, Web database schema identification through simple query interface Lecture Notes in Computer Science. ,vol. 6162, pp. 18- 34 ,(2009) , 10.1007/978-3-642-14415-8_2
Zhi-Hua Zhou, When Semi-supervised Learning Meets Ensemble Learning multiple classifier systems. pp. 529- 538 ,(2009) , 10.1007/978-3-642-02326-2_53
Kevin Chen-Chuan Chang, Bin He, Chengkai Li, Mitesh Patel, Zhen Zhang, Structured databases on the web: observations and implications international conference on management of data. ,vol. 33, pp. 61- 70 ,(2004) , 10.1145/1031570.1031584
Juliano Palmieri Lage, Altigran S. da Silva, Paulo B. Golgher, Alberto H.F. Laender, Automatic generation of agents for collecting hidden web pages for data extraction data and knowledge engineering. ,vol. 49, pp. 177- 196 ,(2004) , 10.1016/J.DATAK.2003.10.003
Umara Noor, Zahid Rashid, Azhar Rauf, TODWEB Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services - iiWAS '11. pp. 190- 197 ,(2011) , 10.1145/2095536.2095569
Yoav Freund, Robert E Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting conference on learning theory. ,vol. 55, pp. 119- 139 ,(1997) , 10.1006/JCSS.1997.1504