A hierarchical approach to model web query interfaces for web source integration

作者： Eduard C. Dragut , Thomas Kabisch , Clement Yu , Ulf Leser

关键词: Web modeling 、 Web page 、 Query expansion 、 Information retrieval 、 Computer science 、 Data integration 、 Crawling 、 Query optimization 、 Web search query 、 Tree structure 、 Web service 、 Web query classification

摘要: Much data in the Web is hidden behind query interfaces. In most cases only means to "surface" content of a database by formulating complex queries on such Applications as Deep crawling and integration require an automatic usage these Therefore, important problem be addressed extraction interfaces into appropriate model. We hypothesize existence set domain-independent "commonsense design rules" that guides creation These rules transform schema trees. this paper we describe interface algorithm, which combines HTML tokens geometric layout within page. Tokens are classified several classes out significant ones text field tokens. A tree structure derived for using their layout. Another The hierarchical representation obtained iteratively merging two Thus, convert problem. Our experiments show promise our algorithm: it outperforms previous approaches extracting about 6.5% accuracy evaluated over three corpora with more than 500 from 15 different domains.

参考文章(25)

Boris Chidlovskii, André Bergholz, Crawling for Domain-Speci.c Hidden Web Resources web information systems engineering. pp. 125- ,(2003)

Shirley Cohen, Shawn R. Jeffery, David Ko, Alon Halevy, Xin (Luna) Dong, Jayant Madhavan, Cong Yu, Web-scale Data Integration: You can only afford to Pay As You Go conference on innovative data systems research. pp. 342- 350 ,(2007)

Jiying Wang, Ji-Rong Wen, Fred Lochovsky, Wei-Ying Ma, Instance-based schema matching for web databases by domain-specific query probing very large data bases. pp. 408- 419 ,(2004) , 10.1016/B978-012088469-8.50038-3

Hai He, Weiyi Meng, Clement Yu, Zonghuan Wu, Constructing interface schemas for search interfaces of web databases web information systems engineering. pp. 29- 42 ,(2005) , 10.1007/11581062_3

Kevin Chen-Chuan Chang, Bin He, Chengkai Li, Mitesh Patel, Zhen Zhang, Structured databases on the web: observations and implications international conference on management of data. ,vol. 33, pp. 61- 70 ,(2004) , 10.1145/1031570.1031584

Clement Yu, Weiyi Meng, Eduard C. Dragut, Meaningful labeling of integrated query interfaces very large data bases. pp. 679- 690 ,(2006) , 10.5555/1182635.1164186

Bin He, Zhen Zhang, Kevin Chen-Chuan Chang, MetaQuerier Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05. pp. 927- 929 ,(2005) , 10.1145/1066157.1066291

Bin He, Kevin Chen-Chuan Chang, Jiawei Han, Discovering complex matchings across web query interfaces: a correlation mining approach knowledge discovery and data mining. pp. 148- 157 ,(2004) , 10.1145/1014052.1014071

Jiying Wang, Fred H. Lochovsky, Data extraction and label assignment for web databases Proceedings of the twelfth international conference on World Wide Web - WWW '03. pp. 187- 196 ,(2003) , 10.1145/775152.775179

10.

Hai He, Weiyi Meng, Clement Yu, Zonghuan Wu, Automatic integration of Web search interfaces with WISE-Integrator very large data bases. ,vol. 13, pp. 256- 273 ,(2004) , 10.1007/S00778-004-0126-4

A hierarchical approach to model web query interfaces for web source integration

来源期刊

我的账户

A hierarchical approach to model web query interfaces for web source integration

来源期刊

相似文章 10

我的账户