Arabic Retrieval Revisited: Morphological Hole Filling

作者: Kareem Darwish , Ahmed Ali

DOI:

关键词:

摘要: Due to Arabic's morphological complexity, Arabic retrieval benefits greatly from analysis -- particularly stemming. However, the best known stemming does not handle linguistic phenomena such as broken plurals and malformed stems. In this paper we propose a model of character-level transformation that is trained using Wikipedia hypertext page title links. The use our yields statistically significant improvements in over statistical technique. technique can potentially be applied other languages.

参考文章(23)
Steven M. Beitzel, Ophir Frieder, Mohammed Aljlayl, David O. Holmes, Eric C. Jensen, M. Lee, David A. Grossman, Abdur Chowdhury, IIT at TREC-10. text retrieval conference. ,(2001)
Fredric C. Gey, Aitao Chen, Building an Arabic Stemmer for Information Retrieval. text retrieval conference. ,(2002)
Raghavendra Udupa, Saravanan K, Anton Bakalov, Abhijit Bhole, “They Are Out There, If You Know Where to Look”: Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval Lecture Notes in Computer Science. pp. 437- 448 ,(2009) , 10.1007/978-3-642-00958-7_39
Marco Baroni, Johannes Matiasek, Harald Trost, Unsupervised discovery of morphologically related words based on orthographic and semantic similarity meeting of the association for computational linguistics. pp. 48- 57 ,(2002) , 10.3115/1118647.1118653
Kareem Darwish, Mohamed Abdul-Wahab, Ahmed Taei, Ali El-Kahki, Transliteration Mining Using Large Training and Test Sets north american chapter of the association for computational linguistics. pp. 243- 252 ,(2012)
Patrick Schone, Daniel Jurafsky, Knowledge-free induction of inflectional morphologies Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001 - NAACL '01. pp. 1- 9 ,(2001) , 10.3115/1073336.1073360
Christian Jacquemin, Guessing morphology from terms and corpora international acm sigir conference on research and development in information retrieval. ,vol. 31, pp. 156- 165 ,(1997) , 10.1145/258525.258557
Burcu Karagol-Ayan, David Doermann, Amy Weinberg, Morphology induction from limited noisy data using approximate string matching Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology - SIGPHON '06. pp. 60- 68 ,(2006) , 10.3115/1622165.1622173
Kareem Darwish, Hany Hassan, Ossama Emam, Examining the Effect of Improved Context Sensitive Morphology on Arabic Information Retrieval meeting of the association for computational linguistics. pp. 25- 30 ,(2005) , 10.3115/1621787.1621793
Leah S. Larkey, Lisa Ballesteros, Margaret E. Connell, Improving stemming for Arabic information retrieval Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02. pp. 275- 282 ,(2002) , 10.1145/564376.564425