Parsing Biomedical Literature

作者: Matthew Lease , Eugene Charniak , None

DOI: 10.1007/11562214_6

关键词:

摘要: We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus MEDLINE abstracts [1,2]. begin by observing that Penn Treebank (PTB) is lexically impoverished when measured various genres scientific and technical writing, this significantly impacts parse accuracy. To resolve without requiring in-domain treebank data, we show how existing domain-specific lexical resources may be leveraged to augment PTB-training: part-of-speech tags, dictionary collocations, named-entities. Using state-of-the-art statistical [3] as our baseline, lexically-adapted achieves 14.2% reduction in error. With oracle-knowledge named-entities, error improves 21.2%.

参考文章(25)
Stuart Shieber, Rebecca Hwa, Learning probabilistic lexicalized grammars for natural language processing Harvard University. ,(2001)
Berry De Bruijn, Joel D. Martin, Literature mining in molecular biology ,(2002)
Daniel Gildea, Corpus Variation and Parser Performance empirical methods in natural language processing. ,(2001)
Yusuke Miyao, Takashi Ninomiya, Jun’ichi Tsujii, Corpus-Oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank international joint conference on natural language processing. pp. 684- 693 ,(2004) , 10.1007/978-3-540-30211-7_72
Eugene Charniak, A maximum-entropy-inspired parser north american chapter of the association for computational linguistics. pp. 132- 139 ,(2000)
Eugene Charniak, Statistical parsing with a context-free grammar and word statistics national conference on artificial intelligence. pp. 598- 603 ,(1997)
Stuart Shieber, Joshua T. Goodman, Parsing inside-out arXiv: Computation and Language. ,(1998)
Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556
Chris Buckley, Implementation of the SMART Information Retrieval System Cornell University. ,(1985)
A. T. McCray, A. C. Browne, S. Srinivasan, Lexical methods for managing variation in biomedical terminologies. annual symposium on computer application in medical care. pp. 235- 239 ,(1994)