作者: Nigel Collier , Mike Conway , Son Doan
DOI: 10.1007/978-1-4419-1274-9_4
关键词:
摘要: Identifying articles that relate to infectious diseases is a necessary step for any automatic bio-surveillance system monitors news from the Internet. Unlike scientific are available in strongly structured form, usually loosely structured. In this chapter, we investigate importance of each section and effect weighting on performance text classification. The experimental results show (1) classification models using headline leading sentence achieve high terms F-score compared other parts article; (2) all with bag-of-word representation (full text) achieves highest recall; (3) information can help improve accuracy.