Extending metadata definitions by automatically extracting and organizing glossary definitions

作者: Judith Klavans , Andrew Philpot , Eduard Hovy , Ulrich Germann , Samuel Popper

DOI: 10.5555/1123196.1123248

关键词:

摘要: Metadata descriptions of database contents are required to build and use systems that access deliver data in response user requests. When numerous heterogeneous databases brought together a single system, their various metadata formalizations must be homogenized integrated order support the planning delivery system. This integration is tedious process requires human expertise attention. In this paper we describe method speeding up formalization new metadata. The takes advantage fact often described web pages containing natural language glossaries define pertinent aspects data. Given root URL, our identifies likely glossaries, extracts formalizes relevant concepts defined them, automatically integrates formalized into large model domain associated conceptualizations. demo will show end-to-end performance system.The demonstration concept acquisition placement process. Using AskCal interface (Philpot et al, 2002), ask viewer interact with system retrieve some We then introduce topic, one for whom there no yet ontology. browse ontology verify this. identify candidate glossary or sites using prior work (Klavans 2002) and/or web-accessible term finder. separate window, display glossary-containing file from www.eia.gov similar, which text. accompanying paper, enter URL activate analysis alignment procedures. Upon conclusion, announce as many it has found. re-display browser. Newly acquired displayed different color. free click on examine contents, well hyperclick back source page verification. thus illustrate not only latest work, but major part EDC have been building over past 4 years.

参考文章(14)
Judith Klavans, Brian Whitman, Extracting taxonomic relationships from on-line definitional sources using LEXING acm/ieee joint conference on digital libraries. pp. 257- 258 ,(2001) , 10.1145/379437.379675
Judith Klavans, Jose Luis Ambite, Andrew Philpot, Yigal Arens, Eduard Hovy, Walter Bourne, Deniz Saroz, Data Acquisition and Integration in the DGRC's Energy Data Collection Project ,(2001)
Steve K. Luk, Kevin Knight, Building a large-scale knowledge base for machine translation national conference on artificial intelligence. pp. 773- 778 ,(1994)
Craig A. Knoblock, Yigal Arens, Chun-Nan Hsu, Query processing in the SIMS information mediator Morgan Kaufmann Publishers Inc.. pp. 82- 90 ,(1997)
José Luis Ambite, Craig A. Knoblock, Flexible and scalable cost-based query planning in mediators: a transformational approach Artificial Intelligence. ,vol. 118, pp. 115- 161 ,(2000) , 10.1016/S0004-3702(00)00003-5
Eduard Hovy, Combining and standardizing large- scale, practical ontologies for machine tranlation and other uses language resources and evaluation. pp. 535- 542 ,(1998)
Judith L. Klavans, Smaranda Muresan, Samuel D. Popper, Peter T. Davis, Building a terminological database from heterogeneous definitional sources international conference on digital government research. pp. 1- 4 ,(2003) , 10.5555/1123196.1123220
Philip Stuart Resnik, Selection and information: a class-based approach to lexical relationships University of Pennsylvania. ,(1993)
Jose Luis Ambite, Andrew Philpot, Eduard Hovy, DGRC AskCal: natural language question answering for energy time series international conference on digital government research. pp. 1- 7 ,(2002) , 10.5555/1123098.1123122