作者: Hermann Kroll , Judy Al-Chaar , Wolf-Tilo Balke , Anita de Waard , Yuanxi Fu
DOI:
关键词:
摘要: A central challenge for digital libraries is to provide effective access paths to ever-growing collections of mostly textual, ie, unstructured information. The traditional, yet expensive way to manage, categorize, and annotate such collections is extensive manual metadata curation to semantically enrich library items. The ability to convert textual information automatically into a structured representation would be extremely beneficial, allowing for novel access paths as well as supporting semantically meaningful discovery. This paper investigates opportunities and challenges that the latest techniques for open information extraction offer for digital libraries. Open information extraction promises to work out-of-the-box and does not require domain-specific training data. To evaluate how well such tools perform, we perform a qualitative evaluation in two domains: general news and biomedicine. Our research shows current benefits, but also reveals serious challenges for practical applications. In particular three research questions still have to be addressed to reliably use open information extraction in digital library projects.