Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents

作者: Nabil Alami , Noureddine En-nahnahi , Said Alaoui Ouatik , Mohammed Meknassi

DOI: 10.1007/S13369-018-3198-Y

关键词:

摘要: Traditional Arabic text summarization (ATS) systems are based on bag-of-words representation, which involve a sparse and high-dimensional input data. Thus, dimensionality reduction is greatly needed to increase the power of features discrimination. In this paper, we present new method for ATS using variational auto-encoder (VAE) model learn feature space from We explore several representations such as term frequency (tf), tf-idf both local global vocabularies. All sentences ranked according latent representation produced by VAE. investigate impact VAE with two approaches, graph-based query-based approaches. Experiments benchmark datasets specifically designed show that vocabularies clearly provides more discriminative improves recall other models. Experiment results confirm proposed leads better performance than most state-of-the-art extractive approaches

参考文章(46)
Rasim M. Alguliyev, Ramiz M. Aliguliyev, Nijat R. Isazade, An unsupervised approach to generating generic summaries of documents soft computing. ,vol. 34, pp. 236- 250 ,(2015) , 10.1016/J.ASOC.2015.04.050
Ming Zhou, Sujian Li, Furu Wei, Li Dong, Ziqiang Cao, Ranking with recursive neural networks and its application to multi-document summarization national conference on artificial intelligence. pp. 2153- 2159 ,(2015)
Sheng-hua Zhong, Yan Liu, Bin Li, Jing Long, Query-oriented unsupervised multi-document summarization via deep learning model Expert Systems With Applications. ,vol. 42, pp. 8146- 8155 ,(2015) , 10.1016/J.ESWA.2015.05.034
Udo Kruschwitz, Mahmoud El-Haj, Chris Fox, Using Mechanical Turk to Create a Corpus of Arabic Summaries LREC 2010. ,(2010)
Rada Mihalcea, Paul Tarau, TextRank: Bringing Order into Text empirical methods in natural language processing. pp. 404- 411 ,(2004)
Daan Wierstra, Shakir Mohamed, Danilo Jimenez Rezende, Stochastic Backpropagation and Approximate Inference in Deep Generative Models arXiv: Machine Learning. ,(2014)
Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell, Kate Saenko, Long-term recurrent convolutional networks for visual recognition and description computer vision and pattern recognition. pp. 2625- 2634 ,(2015) , 10.1109/CVPR.2015.7298878
Max Welling, Diederik P Kingma, Auto-Encoding Variational Bayes international conference on learning representations. ,(2014)
H. P. Luhn, The automatic creation of literature abstracts Ibm Journal of Research and Development. ,vol. 2, pp. 159- 165 ,(1958) , 10.1147/RD.22.0159
I. V. Mashechkin, M. I. Petrovskiy, D. S. Popov, D. V. Tsarev, Automatic text summarization using latent semantic analysis Programming and Computer Software. ,vol. 37, pp. 299- 305 ,(2011) , 10.1134/S0361768811060041