作者: Kamal Al-Sabahi , Zuping Zhang , Jun Long , Khaled Alwesabi
DOI: 10.1007/S13369-018-3286-Z
关键词: Latent semantic analysis 、 Selection algorithm 、 Natural language processing 、 Semantic memory 、 Word order 、 Sentence 、 Computer science 、 Part of speech 、 Weighting 、 Artificial intelligence 、 Linguistic Data Consortium
摘要: The fast-growing amount of information on the Internet makes research in automatic document summarization very urgent. It is an effective solution for overload. Many approaches have been proposed based different strategies, such as latent semantic analysis (LSA). However, LSA, when applied to summarization, has some limitations which diminish its performance. In this work, we try overcome these by applying statistic and linear algebraic combined with syntactic processing text. First, part speech tagger utilized reduce dimension LSA. Then, weight term four adjacent sentences added weighting schemes while calculating input matrix take into account word order relations. addition, a new LSA-based sentence selection algorithm proposed, description each topic turn generated summary more informative diverse. To ensure effectiveness algorithm, extensive experiment Arabic English are done. Four datasets used evaluate model, Linguistic Data Consortium (LDC) Newswire-a corpus, Essex Summaries Corpus (EASC), DUC2002, Multilingual MSS 2015 dataset. Experimental results show model datasets. performs comprehensively better compared state-of-the-art methods.