作者: Luis Gravano , Panagiotis G. Ipeirotis , Mehran Sahami
关键词:
摘要: The contents of many valuable Web-accessible databases are only available through search interfaces and hence invisible to traditional Web "crawlers." Recently, commercial sites have started manually organize into Yahoo!-like hierarchical classification schemes. Here we introduce QProber, a modular system that automates this process by using small number query probes, generated document classifiers. QProber can use variety types classifiers generate the probes. To classify database, does not retrieve or inspect any documents pages from but rather just exploits matches each probe generates at database in question. We conducted an extensive experimental evaluation over collections real documents, experimenting with different retrieval models. also tested our one hundred databases. Our experiments show has low overhead achieves high accuracy across