A good space: Lexical predictors in word space evaluation

作者: Christian Smith , Arne J"onsson , Henrik Danielsson

DOI:

关键词:

摘要: Vector space models benefit from using an outside corpus to train the model. It is, however, unclear what constitutes a good training corpus. We have investigated effect on summary quality when various language resources vector based extraction summarizer. This is done by evaluating performance of summarizer utilizing spaces built corpora different genres, partitioned Swedish SUC-corpus. The are also characterized variety lexical measures commonly used in readability studies. measured comparing automatically produced summaries human created gold standard ROUGE F-score. Our results show that genre does not significant quality. However, variance F-score between genres as independent variables linear regression model, shows texts with high syntactic complexity, word variation, short sentences and few long words produce better summaries.

参考文章(13)
Arne Jönsson, Christian Smith, Enhancing extraction based summarization with outside word space international joint conference on natural language processing. pp. 1062- 1070 ,(2011)
Magnus Sahlgren, Towards pertinent evaluation methodologies for word-space models language resources and evaluation. pp. 821- 824 ,(2006)
Magnus Sahlgren, An Introduction to Random Indexing terminology and knowledge engineering. ,(2005)
Magnus Sahlgren, Jussi Karlgren, From Words to Understanding CSLI Publications. pp. 294- 308 ,(2001)
Jussi Karlgren, Douglass Cutting, Recognizing text genres with simple metrics using discriminant analysis international conference on computational linguistics. ,vol. 2, pp. 1071- 1075 ,(1994) , 10.3115/991250.991324
Sergey Brin, Lawrence Page, The anatomy of a large-scale hypertextual Web search engine the web conference. ,vol. 30, pp. 107- 117 ,(1998) , 10.1016/S0169-7552(98)00110-X
Arne Jönsson, Christian Smith, Automatic summarization as means of simplifying texts, an evaluation for Swedish Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011). pp. 198- 205 ,(2011)
Rada Mihalcea, Graph-based ranking algorithms for sentence extraction, applied to text summarization meeting of the association for computational linguistics. pp. 20- ,(2004) , 10.3115/1219044.1219064