Adaptive focused crawling based on link analysis

作者: Debashis Hati , Biswajit Sahoo , Amritesh Kumar

DOI: 10.1109/ICETC.2010.5529641

关键词:

摘要: A web search engine is designed to for information on the World Wide Web (WWW). Crawlers are software which can traverse internet and retrieve pages by hyperlinks. In face of large spam websites, traditional crawlers cannot function well solve this problem. Focused utilize semantic technologies analyze semantics hyperlinks documents. The focused crawler a special-purpose aims selectively seek out that relevant pre-defined set topics, rather than exploit all regions Web. program used searching related some interested topics from Internet. main property crawling does not need collect pages, but selects retrieves only. As only computer program, it determine how page is. major problem maximal quality page. our proposed approach, we calculate unvisited URL score based its Anchor text relevancy, description in Google similarity with topic keywords, cohesive keywords Relevancy parent pages. calculated vector space model.

参考文章(6)
Soumen Chakrabarti, Martin van den Berg, Byron Dom, Focused crawling: a new approach to topic-specific Web resource discovery the web conference. ,vol. 31, pp. 1623- 1640 ,(1999) , 10.1016/S1389-1286(99)00052-3
Deepak Singh Tomar, Anshika Pal, S. C. Shrivastava, Effective Focused Crawling Based on Content and Link Structure Analysis arXiv: Information Retrieval. ,(2009)
Qu Cheng, Wang Beizhan, Wei Pianpian, Efficient focused crawling strategy using combination of link structure and content similarity international conference on information technology in medicine and education. pp. 1045- 1048 ,(2008) , 10.1109/ITME.2008.4744029
Yulian Zhang, Chunxia Yin, Fuyong Yuan, An Application of Improved PageRank in Focused Crawler fuzzy systems and knowledge discovery. ,vol. 2, pp. 331- 335 ,(2007) , 10.1109/FSKD.2007.142
Xiaolin Zheng, Tao Zhou, Zukun Yu, Deren Chen, URL Rule Based Focused Crawler international conference on e-business engineering. pp. 147- 154 ,(2008) , 10.1109/ICEBE.2008.61
M. Yuvarani, N.ch.s.n. Iyengar, A. Kannan, LSCrawler: A Framework for an Enhanced Focused Web Crawler Based on Link Semantics web intelligence. pp. 794- 800 ,(2006) , 10.1109/WI.2006.112