An Empirical Study of Sections in Classifying Disease Outbreak Reports

作者: Nigel Collier , Mike Conway , Son Doan

DOI: 10.1007/978-1-4419-1274-9_4

关键词:

摘要: Identifying articles that relate to infectious diseases is a necessary step for any automatic bio-surveillance system monitors news from the Internet. Unlike scientific are available in strongly structured form, usually loosely structured. In this chapter, we investigate importance of each section and effect weighting on performance text classification. The experimental results show (1) classification models using headline leading sentence achieve high terms F-score compared other parts article; (2) all with bag-of-word representation (full text) achieves highest recall; (3) information can help improve accuracy.

参考文章(19)
Yuji Matsumoto, Taku Kudo, A Boosting Algorithm for Classification of Semi - Structured Text IPSJ SIG Notes. ICS. ,vol. 2004, pp. 163- 168 ,(2004)
Andreas Hotho, Steffen Staab, Gerd Stumme, WordNet improves text document clustering international acm sigir conference on research and development in information retrieval. pp. 541- ,(2003)
Nigel Collier, Mika Shigematsu, Kiyosu Taniguchi, Roberto Barrero, Ai Kawazoe, Lihua Jin, The development of a schema for the annotation of terms in the BioCaster disease detecting/tracking system CEUR Workshop Proceedings. ,(2006)
Sam Scott, Stan Matwin, Feature Engineering for Text Classification international conference on machine learning. pp. 379- 388 ,(1999)
Stephan Bloehdorn, Andreas Hotho, Boosting for text classification with semantic features web mining and web usage analysis. pp. 149- 166 ,(2004) , 10.1007/11899402_10
Thorsten Joachims, Making large scale SVM learning practical Technical reports. ,(1999) , 10.17877/DE290R-14262
Ellen Riloff, Tom Mitchell, Johannes Ffirnkranz, A Case Study in Using Linguistic Phrases for Text Categorization on the WWW AAAI Press. ,(1998)
Meliha Yetisgen-Yildiz, Wanda Pratt, The effect of feature representation on MEDLINE document classification. american medical informatics association annual symposium. ,vol. 2005, pp. 849- 853 ,(2005)
Son Doan, Ai Kawazoe, Nigel Collier, The Role of Roles in Classifying Annotated Biomedical Text meeting of the association for computational linguistics. pp. 17- 24 ,(2007) , 10.3115/1572392.1572396
Yoko Mizuta, Nigel Collier, Zone identification in biology articles as a basis for information extraction JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications. pp. 29- 35 ,(2004) , 10.3115/1567594.1567600