作者: Debashis Hati , Biswajit Sahoo , Amritesh Kumar
DOI: 10.1109/ICETC.2010.5529641
关键词:
摘要: A web search engine is designed to for information on the World Wide Web (WWW). Crawlers are software which can traverse internet and retrieve pages by hyperlinks. In face of large spam websites, traditional crawlers cannot function well solve this problem. Focused utilize semantic technologies analyze semantics hyperlinks documents. The focused crawler a special-purpose aims selectively seek out that relevant pre-defined set topics, rather than exploit all regions Web. program used searching related some interested topics from Internet. main property crawling does not need collect pages, but selects retrieves only. As only computer program, it determine how page is. major problem maximal quality page. our proposed approach, we calculate unvisited URL score based its Anchor text relevancy, description in Google similarity with topic keywords, cohesive keywords Relevancy parent pages. calculated vector space model.