A study on using two-phase conditional random fields for query interface segmentation

作者: Yongquan Dong , Xiangjun Zhao , Gongjie Zhang

DOI: 10.1007/978-3-642-23982-3_45

关键词:

摘要: Recently, the Web has been rapidly "deepened" by many searchable databases online, where data are hidden behind query interfaces. Automatic processing of a interface is must to access invisible contents deep Web. This entails automatic segmentation, i.e., task grouping related components an together. The segmentation divided into two steps: component labeling and grouping. In this paper we present new approach perform using two-phase Conditional Random Fields (CRFs). At first phase, one CRFs model used tag each with semantic label (attribute-name, operator, operand or other); at second another create groups components. Experiments show that our yields high accuracy.

参考文章(12)
Kevin Chen Chuan Chang, Zhen Zhang, Bin He, Toward large scale integration: Building a MetaQuerier over databases on the Web conference on innovative data systems research. pp. 44- 55 ,(2005)
Shirley Cohen, Shawn R. Jeffery, David Ko, Alon Halevy, Xin (Luna) Dong, Jayant Madhavan, Cong Yu, Web-scale Data Integration: You can only afford to Pay As You Go conference on innovative data systems research. pp. 342- 350 ,(2007)
Yongquan Dong, Qingzhong Li, Yanhui Ding, Zhaohui Peng, ETTA-IM: A deep web query interface matching approach based on evidence theory and task assignment Expert Systems With Applications. ,vol. 38, pp. 10218- 10228 ,(2011) , 10.1016/J.ESWA.2011.02.064
Bin He, Mitesh Patel, Zhen Zhang, Kevin Chen-Chuan Chang, Accessing the deep web Communications of the ACM. ,vol. 50, pp. 94- 101 ,(2007) , 10.1145/1230819.1241670
Hai He, Weiyi Meng, Yiyao Lu, Clement Yu, Zonghuan Wu, Towards Deeper Understanding of the Search Interfaces of the Deep Web World Wide Web. ,vol. 10, pp. 133- 155 ,(2007) , 10.1007/S11280-006-0010-9
Dong C. Liu, Jorge Nocedal, On the limited memory BFGS method for large scale optimization Mathematical Programming. ,vol. 45, pp. 503- 528 ,(1989) , 10.1007/BF01589116
Thanh Nguyen, Juliana Freire, Learning to extract form labels very large data bases. ,vol. 1, pp. 684- 694 ,(2008) , 10.14778/1453856.1453931
Ritu Khare, Yuan An, An empirical study on using hidden markov model for search interface segmentation conference on information and knowledge management. pp. 17- 26 ,(2009) , 10.1145/1645953.1645959
Wensheng Wu, Clement Yu, AnHai Doan, Weiyi Meng, An interactive clustering-based approach to integrating source query interfaces on the deep Web international conference on management of data. pp. 95- 106 ,(2004) , 10.1145/1007568.1007582
Zhen Zhang, Bin He, Kevin Chen-Chuan Chang, Understanding Web query interfaces: best-effort parsing with hidden syntax international conference on management of data. pp. 107- 118 ,(2004) , 10.1145/1007568.1007583