Information extraction for enhanced access to disease outbreak reports

作者: Ralph Grishman , Silja Huttunen , Roman Yangarber

DOI: 10.1016/S1532-0464(03)00013-3

关键词: Web crawlerInformation retrievalOutbreakInformation extractionWorld Wide WebComputer science

摘要: Document search is generally based on individual terms in the document. However, for collections within limited domains it possible to provide more powerful access tools. This paper describes a system designed of reports infectious disease outbreaks. The system, Proteus-BIO, automatically creates table outbreaks, with each entry linked document describing that outbreak; this makes use database operations such as selection and sorting find relevant documents. Proteus-BIO consists Web crawler which gathers documents; an information extraction engine converts outbreak events tabular database; browser provides and, through them, uses sets patterns word classes extract about event. Preparing these has been time-consuming manual operation past, but automated discovery tools now make task significantly easier. A small study comparing effectiveness index conventional demonstrated users can substantially documents given time period Proteus-BIO.

参考文章(26)
Ralph Grishman, Roman Yangarber, Silja Huttunen, Diversity of scenarios in information extraction language resources and evaluation. ,(2002)
Ralph Grishman, Roman Yangarber, Customization of information extraction systems ,(1997)
Zellig S. Harris, Linguistic Transformations for Information Retrieval Springer, Dordrecht. pp. 458- 471 ,(1970) , 10.1007/978-94-017-6059-1_24
Catherine MacLeod, Ralph Grishman, Adam Meyers, COMLEX syntax : A large syntactic dictionary for natural language processing Computers and The Humanities. ,vol. 31, pp. 459- 481 ,(1997) , 10.1023/A:1001142417369
Ralph Grishman, Roman Yangarber, NYU: Description of the Proteus/PET system as used for MUC-7 ST Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Ralph Grishman, Roman Yangarber, Scenario customization for information extraction New York University. ,(2000)
Margaret S. Lyman, Naomi. Sager, Carol Friedman, Medical Language Processing: Computer Management of Narrative Data ,(1987)
Jerry R. Hobbs, Mabry Tyson, Douglas E. Appelt, David J. Israel, John Bear, FASTUS: A Finite-state Processor for Information Extraction from Real-world Text. international joint conference on artificial intelligence. pp. 1172- 1178 ,(1993)
Tomek Strzalkowski, Jin Wang, A self-learning universal concept spotter Proceedings of the 16th conference on Computational linguistics -. pp. 931- 936 ,(1996) , 10.3115/993268.993329