作者: Parisa Kordjamshidi , Dan Roth , Marie-Francine Moens
DOI: 10.1186/S12859-015-0542-Z
关键词: Natural language 、 Structure (mathematical logic) 、 Biomedical text mining 、 Spatial analysis 、 Spatial relation 、 Information retrieval 、 Task (project management) 、 Computer science 、 Web page 、 Structured prediction
摘要: We aim to automatically extract species names of bacteria and their locations from webpages. This task is important for exploiting the vast amount biological knowledge which expressed in diverse natural language texts putting this databases easy access by biologists. The challenging previous results are far below an acceptable level performance, particularly extraction localization relationships. Therefore, we design a new system such extractions, using framework structured machine learning techniques. model joint biomedical entities relationship. Our based on spatial role labeling (SpRL) designed understanding unrestricted text. extend SpRL discourse relations domain apply it BioNLP-ST 2013, BB-shared task. highlight main differences between general information scientific text focus work. exploit text’s structure global features. features substantially improve systems, achieving absolute improvement approximately 57 percent over F1 measure best experimental indicate that all relationships document outperforms extracts independently. significantly improves state-of-the-art has high potential be adopted other processing (NLP) tasks domain.