CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION

作者: Hieu Quang Le , Stefan Conrad

DOI:

关键词:

摘要: This paper studies the problem of classifying structured data sources on Web. While prior works use all features, once extracted from search interfaces, we further refine feature set. In our research, each interface is treated simply as a bag-of-words. We choose subset words, which suited to classify web sources, by selection methods with new metrics and novel simple ranking scheme. Using aggressive approach, together Gaussian process classifier, obtained high classification performance in an evaluation over real data.

参考文章(26)
Guy W. Mineau, Pascal Soucy, A simple feature selection method for text classification international joint conference on artificial intelligence. pp. 897- 902 ,(2001)
Kevin Chen Chuan Chang, Zhen Zhang, Bin He, Toward large scale integration: Building a MetaQuerier over databases on the Web conference on innovative data systems research. pp. 44- 55 ,(2005)
Dunja Mladenić, Feature subset selection in text-learning european conference on machine learning. pp. 95- 100 ,(1998) , 10.1007/BFB0026677
Radford M. Neal, Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification arXiv: Data Analysis, Statistics and Probability. ,(1997)
Christopher K I Williams, Carl Edward Rasmussen, Gaussian Processes for Machine Learning ,(2005)
Joann J. Ordille, Anand Rajaraman, Alon Y. Levy, Querying Heterogeneous Information Sources Using Source Descriptions very large data bases. pp. 251- 262 ,(1996)
Bin He, Tao Tao, Kevin Chen-Chuan Chang, Organizing structured web sources by query schemas: a clustering approach conference on information and knowledge management. pp. 22- 31 ,(2004) , 10.1145/1031171.1031178
Kevin Chen-Chuan Chang, Bin He, Chengkai Li, Mitesh Patel, Zhen Zhang, Structured databases on the web: observations and implications international conference on management of data. ,vol. 33, pp. 61- 70 ,(2004) , 10.1145/1031570.1031584
Monica Rogati, Yiming Yang, High-performing feature selection for text classification conference on information and knowledge management. pp. 659- 661 ,(2002) , 10.1145/584792.584911
Evgeniy Gabrilovich, Shaul Markovitch, Text categorization with many redundant features Twenty-first international conference on Machine learning - ICML '04. pp. 41- ,(2004) , 10.1145/1015330.1015388