Formal concept analysis approach for data extraction from a limited deep web database

作者: Zhuo Zhang , Juan Du , Liming Wang

DOI: 10.1007/S10844-013-0242-Y

关键词: Pruning (decision trees)Cardinality (SQL statements)Computer scienceViewDatabaseQuery optimizationWeb query classificationWeb search queryFormal concept analysisData extractionTheoretical computer scienceData mining

摘要: Few studies have addressed the problem of extracting data from a limited deep web database. We apply formal concept analysis to this and propose novel algorithm called EdaliwdbFCA. Before query Y is sent, analyzes local context K L , which consists latest extracted data, predicts size results according cardinality extent X (X,Y) derived . Thus, it can be determined in advance if or not. Candidate concepts are dynamically generated lower cover current (X,Y). Therefore, method avoids building concrete lattices during extraction. Moreover, two pruning rules adopted reduce redundant queries. Experiments on controlled sets real applications were performed. The confirm that theories correct effectively applied world.

参考文章(24)
Yuekui Yang, Yajun Du, Jingyu Sun, Yufeng Hai, A Topic-Specific Web Crawler with Concept Similarity Context Graph Based on FCA international conference on intelligent computing. pp. 840- 847 ,(2008) , 10.1007/978-3-540-85984-0_101
Kevin Chen Chuan Chang, Zhen Zhang, Bin He, Toward large scale integration: Building a MetaQuerier over databases on the Web conference on innovative data systems research. pp. 44- 55 ,(2005)
Yan Wang, Jianguo Lu, Jie Liang, Jessica Chen, Jiming Liu, Selecting queries from sample to crawl deep web data sources Web Intelligence and Agent Systems: An International Journal. ,vol. 10, pp. 75- 88 ,(2012) , 10.3233/WIA-2012-0232
Yan Wang, Jianguo Lu, Jessica Chen, Crawling Deep Web Using a New Set Covering Algorithm advanced data mining and applications. pp. 326- 337 ,(2009) , 10.1007/978-3-642-03348-3_32
Publisher: Bioinfo Publications, Journal of Data Mining and Knowledge Discovery ADVANCED SCIENCES INDEX. ,(2013)
Bernhard Ganter, Rudolf Wille, C. Franzke, Formal Concept Analysis: Mathematical Foundations ,(1998)
Frithjof Dau, Jon Ducrou, Peter Eklund, Concept Similarity and Related Categories in SearchSleuth Conceptual Structures: Knowledge Visualization and Reasoning. pp. 255- 268 ,(2008) , 10.1007/978-3-540-70596-3_18
Bjoern Koester, Conceptual Knowledge Retrieval with FooCA: Improving Web Search Engine Results with Contexts and Concept Hierarchies Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. pp. 176- 190 ,(2006) , 10.1007/11790853_14
Lu Jiang, Zhaohui Wu, Qian Feng, Jun Liu, Qinghua Zheng, Efficient deep web crawling using reinforcement learning knowledge discovery and data mining. pp. 428- 439 ,(2010) , 10.1007/978-3-642-13657-3_46
Qiuyan Huang, Qingzhong Li, Hong Li, Zhongmin Yan, An Approach to Incremental Deep Web Crawling Based on Incremental Harvest Model Procedia Engineering. ,vol. 29, pp. 1081- 1087 ,(2012) , 10.1016/J.PROENG.2012.01.093