作者: Ralph Grishman , Silja Huttunen , Roman Yangarber
DOI: 10.1016/S1532-0464(03)00013-3
关键词: Web crawler 、 Information retrieval 、 Outbreak 、 Information extraction 、 World Wide Web 、 Computer science
摘要: Document search is generally based on individual terms in the document. However, for collections within limited domains it possible to provide more powerful access tools. This paper describes a system designed of reports infectious disease outbreaks. The system, Proteus-BIO, automatically creates table outbreaks, with each entry linked document describing that outbreak; this makes use database operations such as selection and sorting find relevant documents. Proteus-BIO consists Web crawler which gathers documents; an information extraction engine converts outbreak events tabular database; browser provides and, through them, uses sets patterns word classes extract about event. Preparing these has been time-consuming manual operation past, but automated discovery tools now make task significantly easier. A small study comparing effectiveness index conventional demonstrated users can substantially documents given time period Proteus-BIO.