作者: Ana G. Maguitman , Filippo Menczer , Fulya Erdinc , Heather Roinestad , Alessandro Vespignani
DOI: 10.1007/S11280-006-8562-2
关键词:
摘要: Automatic extraction of semantic information from text and links in Web pages is key to improving the quality search results. However, assessment automatic measures limited by coverage user studies, which do not scale with size, heterogeneity, growth Web. Here we propose leverage human-generated metadata--namely topical directories--to measure relationships among massive numbers pairs or topics. The Open Directory Project classifies millions URLs a ontology, providing rich source between can be derived. While similarity based on taxonomies (trees) are well studied, design well-founded for objects stored nodes arbitrary ontologies (graphs) an open problem. This paper defines information-theoretic that exploits both hierarchical non-hierarchical structure ontology. An experimental study shows this improves significantly traditional taxonomy-based approach. novel allows us address general question how link analyses combined derive relevance good agreement similarity. Surprisingly, use turns out ineffective ranking.