Towards role-based filtering of disease outbreak reports

作者: Son Doan , Ai Kawazoe , Mike Conway , Nigel Collier

DOI: 10.1016/J.JBI.2008.12.009

关键词:

摘要: This paper explores the role of named entities (NEs) in classification disease outbreak report. In annotation schema BioCaster, a text mining system for public health protection, important concepts that reflect information about infectious diseases were conceptually analyzed with formal ontological methodology and classified into types roles. Types are specified as NE classes roles integrated NEs attributes such chemical whether it is being used therapy some disease. We focus on explore different ways to extract, combine use them features classifier. addition, we investigate combination semantic categories disease-related nouns verbs. Experimental results using naive Bayes Support Vector Machine (SVM) algorithms show that: (1) improve performance classification, (2) noun verb contribute substantially improvement classification. Both these statistically significant compared baseline ''raw text'' representation. discuss detail effects each terms accuracy, precision/recall F-score measures task.

参考文章(33)
Yuji Matsumoto, Taku Kudo, A Boosting Algorithm for Classification of Semi - Structured Text IPSJ SIG Notes. ICS. ,vol. 2004, pp. 163- 168 ,(2004)
Andreas Hotho, Steffen Staab, Gerd Stumme, WordNet improves text document clustering international acm sigir conference on research and development in information retrieval. pp. 541- ,(2003)
Ellen Riloff, William Phillips, Exploiting Role-Identifying Nouns and Expressions for Information Extraction recent advances in natural language processing. pp. 468- 473 ,(2007)
Ellen Riloff, Siddharth Patwardhan, Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions empirical methods in natural language processing. pp. 717- 727 ,(2007)
Robert J. Gaizauskas, Mark Hepple, George Demetriou, Archana Tapuria, Ian Roberts, Andrea Setzer, Yikun Guo, Angus Roberts, Neil Davis, Bill Wheeldin, Jay (Subbarao) Kola, The CLEF Corpus: Semantic Annotation of Clinical Text american medical informatics association annual symposium. ,vol. 2007, pp. 625- 629 ,(2007)
Nigel Collier, Mika Shigematsu, Kiyosu Taniguchi, Roberto Barrero, Ai Kawazoe, Lihua Jin, The development of a schema for the annotation of terms in the BioCaster disease detecting/tracking system CEUR Workshop Proceedings. ,(2006)
W John Wilbur, Andrey Rzhetsky, Hagit Shatkay, New directions in biomedical text annotation: definitions, guidelines and corpus construction BMC Bioinformatics. ,vol. 7, pp. 356- 356 ,(2006) , 10.1186/1471-2105-7-356
Ken Kaneiwa, Riichiro Mizoguchi, An Order-Sorted Quantified Modal Logic for Meta-ontology Lecture Notes in Computer Science. pp. 169- 184 ,(2005) , 10.1007/11554554_14
T McEnery, P Rayson, DE Archer, SL Piao, The UCREL Semantic Analysis System European Language Resources Association. ,(2004)
Sam Scott, Stan Matwin, Feature Engineering for Text Classification international conference on machine learning. pp. 379- 388 ,(1999)