Algorithmic detection of semantic similarity

作者: Ana G. Maguitman , Filippo Menczer , Heather Roinestad , Alessandro Vespignani

DOI: 10.1145/1060745.1060765

关键词: Semantic integrationOntologyData WebOntology (information science)Semantic analyticsSemantic Web StackExplicit semantic analysisSemantic Web Rule LanguageSocial Semantic WebWeb miningSemantic similaritySemantic searchRankingInformation retrievalSemantic WebMetadataWeb pageSemantic technologyComputer scienceSemantic computingSemantic gridSemantic equivalence

摘要: Automatic extraction of semantic information from text and links in Web pages is key to improving the quality search results. However, assessment automatic measures limited by coverage user studies, which do not scale with size, heterogeneity, growth Web. Here we propose leverage human-generated metadata --- namely topical directories measure relationships among massive numbers pairs or topics. The Open Directory Project classifies millions URLs a ontology, providing rich source between can be derived. While similarity based on taxonomies (trees) are well studied, design well-founded for objects stored nodes arbitrary ontologies (graphs) an open problem. This paper defines information-theoretic that exploits both hierarchical non-hierarchical structure ontology. An experimental study shows this improves significantly traditional taxonomy-based approach. novel allows us address general question how link analyses combined derive relevance good agreement similarity. Surprisingly, use turns out ineffective ranking.

参考文章(23)
Thomas R. Gruber, A Translation Approach to Portable Ontologies Knowledge Acquisition. ,vol. 5, ,(1993)
Nathalie Japkowicz, Jeannette Janssen, Wangzhong Lu, Evangelos Milios, Node similarity in networked information spaces conference of the centre for advanced studies on collaborative research. pp. 11- ,(2001)
Dekang Lin, An Information-Theoretic Definition of Similarity international conference on machine learning. pp. 296- 304 ,(1998)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
M. M. Kessler, Bibliographic coupling between scientific papers American Documentation. ,vol. 14, pp. 10- 25 ,(1963) , 10.1002/ASI.5090140103
Prasanna Ganesan, Hector Garcia-Molina, Jennifer Widom, Exploiting hierarchical domain structure to compute similarity ACM Transactions on Information Systems. ,vol. 21, pp. 64- 93 ,(2003) , 10.1145/635484.635487
William B. Frakes, Ricardo Baeza-Yates, Information Retrieval: Data Structures and Algorithms ,(1992)
F. Menczer, Correlated topologies in citation networks and the Web European Physical Journal B. ,vol. 38, pp. 211- 221 ,(2004) , 10.1140/EPJB/E2004-00114-1
Amos Tversky, None, Features of Similarity Psychological Review. ,vol. 84, pp. 327- 352 ,(1977) , 10.1037/0033-295X.84.4.327