作者: Matthew Lease , Eugene Charniak , None
DOI: 10.1007/11562214_6
关键词:
摘要: We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus MEDLINE abstracts [1,2]. begin by observing that Penn Treebank (PTB) is lexically impoverished when measured various genres scientific and technical writing, this significantly impacts parse accuracy. To resolve without requiring in-domain treebank data, we show how existing domain-specific lexical resources may be leveraged to augment PTB-training: part-of-speech tags, dictionary collocations, named-entities. Using state-of-the-art statistical [3] as our baseline, lexically-adapted achieves 14.2% reduction in error. With oracle-knowledge named-entities, error improves 21.2%.