Arabic Diacritization with Gated Recurrent Unit

作者: Rajae Moumen , Raddouane Chiheb , Rdouan Faizi , Abdellatif El Afia

DOI: 10.1145/3230905.3230931

关键词: Deep learningArabicNatural language processingProcess (engineering)Unit (housing)Artificial intelligenceComputer science

摘要: Arabic and similar languages require the use of diacritics in order to determine necessary parameters pronounce identify every part speech correctly. Therefore, when it comes perform Natural Language Processing (NLP) over Arabic, diacritization is a crucial step. In this paper we gated recurrent unit network as language-independent framework for diacritization. The end-to-end approach allows exclusively vocalized text train system without using external resources. Evaluation performed versus state-of-the-art literature results. We demonstrate that achieve results enhance learning process by scoring better performance training testing timing.

参考文章(13)
Alex Graves, Generating Sequences With Recurrent Neural Networks arXiv: Neural and Evolutionary Computing. ,(2013)
Gheith A. Abandah, Alex Graves, Balkees Al-Shagoor, Alaa Arabiyat, Fuad Jamour, Majid Al-Taee, Automatic diacritization of Arabic text using recurrent neural networks International Journal on Document Analysis and Recognition (IJDAR). ,vol. 18, pp. 183- 197 ,(2015) , 10.1007/S10032-015-0242-2
Imed Zitouni, Ruhi Sarikaya, Arabic diacritic restoration approach based on maximum entropy models Computer Speech & Language. ,vol. 23, pp. 257- 276 ,(2009) , 10.1016/J.CSL.2008.06.001
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735
Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult IEEE Transactions on Neural Networks. ,vol. 5, pp. 157- 166 ,(1994) , 10.1109/72.279181
Imed Zitouni, Jeffrey S. Sorensen, Ruhi Sarikaya, Maximum Entropy Based Restoration of Arabic Diacritics meeting of the association for computational linguistics. pp. 577- 584 ,(2006) , 10.3115/1220175.1220248
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio, Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation empirical methods in natural language processing. pp. 1724- 1734 ,(2014) , 10.3115/V1/D14-1179
Yonatan Belinkov, James Glass, Arabic Diacritization with Recurrent Neural Networks empirical methods in natural language processing. pp. 2281- 2285 ,(2015) , 10.18653/V1/D15-1274
Mohamed Boudchiche, Azzeddine Mazroui, Evaluation of the ambiguity caused by the absence of diacritical marks in Arabic texts: Statistical study international conference on information and communication technology. pp. 1- 6 ,(2015) , 10.1109/ICTA.2015.7426904
Sameh Alansary, Alserag: An Automatic Diacritization System for Arabic International Conference on Advanced Intelligent Systems and Informatics. pp. 182- 192 ,(2016) , 10.1007/978-3-319-48308-5_18