作者: Christian Smith , Arne J"onsson , Henrik Danielsson
DOI:
关键词:
摘要: Vector space models benefit from using an outside corpus to train the model. It is, however, unclear what constitutes a good training corpus. We have investigated effect on summary quality when various language resources vector based extraction summarizer. This is done by evaluating performance of summarizer utilizing spaces built corpora different genres, partitioned Swedish SUC-corpus. The are also characterized variety lexical measures commonly used in readability studies. measured comparing automatically produced summaries human created gold standard ROUGE F-score. Our results show that genre does not significant quality. However, variance F-score between genres as independent variables linear regression model, shows texts with high syntactic complexity, word variation, short sentences and few long words produce better summaries.