Algorithmic Computation and Approximation of Semantic Similarity

作者: Ana G. Maguitman , Filippo Menczer , Fulya Erdinc , Heather Roinestad , Alessandro Vespignani

DOI: 10.1007/S11280-006-8562-2

关键词:

摘要: Automatic extraction of semantic information from text and links in Web pages is key to improving the quality search results. However, assessment automatic measures limited by coverage user studies, which do not scale with size, heterogeneity, growth Web. Here we propose leverage human-generated metadata--namely topical directories--to measure relationships among massive numbers pairs or topics. The Open Directory Project classifies millions URLs a ontology, providing rich source between can be derived. While similarity based on taxonomies (trees) are well studied, design well-founded for objects stored nodes arbitrary ontologies (graphs) an open problem. This paper defines information-theoretic that exploits both hierarchical non-hierarchical structure ontology. An experimental study shows this improves significantly traditional taxonomy-based approach. novel allows us address general question how link analyses combined derive relevance good agreement similarity. Surprisingly, use turns out ineffective ranking.

参考文章(34)
Thomas R. Gruber, A Translation Approach to Portable Ontologies Knowledge Acquisition. ,vol. 5, ,(1993)
Nathalie Japkowicz, Jeannette Janssen, Wangzhong Lu, Evangelos Milios, Node similarity in networked information spaces conference of the centre for advanced studies on collaborative research. pp. 11- ,(2001)
Nello Cristianini, John Shawe-Taylor, Huma Lodhi, Latent Semantic Kernels international conference on machine learning. ,vol. 18, pp. 127- 152 ,(2001) , 10.1023/A:1013625426931
Cliff A. Joslyn, William J. Bruno, Weighted Pseudo-distances for Categorization in Semantic Hierarchies Conceptual Structures: Common Semantics for Sharing Knowledge. pp. 381- 395 ,(2005) , 10.1007/11524564_26
Dekang Lin, An Information-Theoretic Definition of Similarity international conference on machine learning. pp. 296- 304 ,(1998)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
M. M. Kessler, Bibliographic coupling between scientific papers American Documentation. ,vol. 14, pp. 10- 25 ,(1963) , 10.1002/ASI.5090140103
Prasanna Ganesan, Hector Garcia-Molina, Jennifer Widom, Exploiting hierarchical domain structure to compute similarity ACM Transactions on Information Systems. ,vol. 21, pp. 64- 93 ,(2003) , 10.1145/635484.635487
William B. Frakes, Ricardo Baeza-Yates, Information Retrieval: Data Structures and Algorithms ,(1992)