An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

作者: Kamal Al-Sabahi , Zuping Zhang , Jun Long , Khaled Alwesabi

DOI: 10.1007/S13369-018-3286-Z

关键词: Latent semantic analysisSelection algorithmNatural language processingSemantic memoryWord orderSentenceComputer sciencePart of speechWeightingArtificial intelligenceLinguistic Data Consortium

摘要: The fast-growing amount of information on the Internet makes research in automatic document summarization very urgent. It is an effective solution for overload. Many approaches have been proposed based different strategies, such as latent semantic analysis (LSA). However, LSA, when applied to summarization, has some limitations which diminish its performance. In this work, we try overcome these by applying statistic and linear algebraic combined with syntactic processing text. First, part speech tagger utilized reduce dimension LSA. Then, weight term four adjacent sentences added weighting schemes while calculating input matrix take into account word order relations. addition, a new LSA-based sentence selection algorithm proposed, description each topic turn generated summary more informative diverse. To ensure effectiveness algorithm, extensive experiment Arabic English are done. Four datasets used evaluate model, Linguistic Data Consortium (LDC) Newswire-a corpus, Essex Summaries Corpus (EASC), DUC2002, Multilingual MSS 2015 dataset. Experimental results show model datasets. performs comprehensively better compared state-of-the-art methods.

参考文章(40)
Jee-Uk Heu, Iqbal Qasim, Dong-Ho Lee, FoDoSu: Multi-document summarization exploiting semantic analysis based on social Folksonomy Information Processing and Management. ,vol. 51, pp. 212- 225 ,(2015) , 10.1016/J.IPM.2014.06.003
Makbule Gulcin Ozsoy, Ferda Nur Alpaslan, Ilyas Cicekli, Text summarization using Latent Semantic Analysis Journal of Information Science. ,vol. 37, pp. 405- 417 ,(2011) , 10.1177/0165551511408848
BASSAM H. HAMMO, HANI ABU-SALEM, MARTHA W. EVENS, A Hybrid Arabic Text Summarization Technique Based on Text Structure and Topic Identification International Journal of Computer Processing of Languages. ,vol. 23, pp. 39- 65 ,(2011) , 10.1142/S1793840611002206
Mahmoud El-Haj, Udo Kruschwitz, Chris Fox, Multi-document arabic text summarisation computer science and electronic engineering conference. pp. 40- 44 ,(2011) , 10.1109/CEEC.2011.5995822
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, Grégoire Mesnil, A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval conference on information and knowledge management. pp. 101- 110 ,(2014) , 10.1145/2661829.2661935
Michael Elhadad, Kathleen McKeown, Regina Barzilay, Hongyan Jing, Summarization Evaluation Methods: Experiments and Analysis AAAI Symposium on Intelligent Summarization. ,(1998) , 10.7916/D8TB1G77
Chin-Yew Lin, Eduard Hovy, Automatic evaluation of summaries using N-gram co-occurrence statistics Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 71- 78 ,(2003) , 10.3115/1073445.1073465
Ibrahim Imam, Nihal Nounou, Alaa Hamouda, Hebat Allah Abdul Khalek, An Ontology-based Summarization System for Arabic Documents (OSSAD) International Journal of Computer Applications. ,vol. 74, pp. 38- 43 ,(2013) , 10.5120/12980-0237
George Giannakopoulos, Jeff Kubina, John Conroy, Josef Steinberger, Benoit Favre, Mijail Kabadjov, Udo Kruschwitz, Massimo Poesio, MultiLing 2015: Multilingual Summarization of Single and Multi-Documents, On-line Fora, and Call-center Conversations annual meeting of the special interest group on discourse and dialogue. pp. 270- 274 ,(2015) , 10.18653/V1/W15-4638
Sanggam Siahaan, Kisno, School-Aged Children and Adult Language Production in an Indonesian TV Show International Journal of English Language and Translation Studies. ,vol. 02, pp. 106- 112 ,(2014) , 10.5281/ZENODO.15957