作者: Judith Klavans , Andrew Philpot , Eduard Hovy , Ulrich Germann , Samuel Popper
关键词:
摘要: Metadata descriptions of database contents are required to build and use systems that access deliver data in response user requests. When numerous heterogeneous databases brought together a single system, their various metadata formalizations must be homogenized integrated order support the planning delivery system. This integration is tedious process requires human expertise attention. In this paper we describe method speeding up formalization new metadata. The takes advantage fact often described web pages containing natural language glossaries define pertinent aspects data. Given root URL, our identifies likely glossaries, extracts formalizes relevant concepts defined them, automatically integrates formalized into large model domain associated conceptualizations. demo will show end-to-end performance system.The demonstration concept acquisition placement process. Using AskCal interface (Philpot et al, 2002), ask viewer interact with system retrieve some We then introduce topic, one for whom there no yet ontology. browse ontology verify this. identify candidate glossary or sites using prior work (Klavans 2002) and/or web-accessible term finder. separate window, display glossary-containing file from www.eia.gov similar, which text. accompanying paper, enter URL activate analysis alignment procedures. Upon conclusion, announce as many it has found. re-display browser. Newly acquired displayed different color. free click on examine contents, well hyperclick back source page verification. thus illustrate not only latest work, but major part EDC have been building over past 4 years.