作者: Martin Emms
DOI:
关键词:
摘要: Some alternatives to the standard evalb measures for parser evaluation are considered, principally use of a tree-distance measure, which assigns score linearity and ancestry respecting mapping between trees, in contrast measures, assign span preserving mapping. Additionally, analysis suggests some further variants, concerning different normalisations, portions tree compared whether scores should be micro or macro averaged. The outputs 6 parsing systems on Section 23 Penn Treebank were taken. It is shown that ranking varies as alternative used. For fixed system, it also parses from best worst will vary according measure argued ameliorates problem has been noted over-penalisation attachment errors.