New techniques for Arabic document classification

作者: Hamouda Khalifa Hamouda Chantar

DOI:

关键词:

摘要: Text classi cation (TC) concerns automatically assigning a class (category) label to text document, and has increasingly many applications, particularly in the domain of organizing, for browsing large document collections. It is typically achieved via machine learning, where model built on basis collection features. Feature selection critical this process, since there are several thousand potential features (distinct words or terms). In cation, feature aims improve computational e ciency accuracy by removing irrelevant redundant terms (features), while retaining (words) that contain su cient information help with task. This thesis proposes binary particle swarm optimization (BPSO) hybridized either K Nearest Neighbour (KNN) Support Vector Machines (SVM) Arabic tasks. Comparison between approaches done using selected conjunction SVM, Decision Trees (C4.5), Naive Bayes (NB), classify hold out test set. Using publically available datasets, results show BPSO/KNN BPSO/SVM techniques promising domain. The sets also analyzed consider di erences types tend choose. leads speculation concerning appropriate strategy, based relationship classes categorization task at hand. investigates use statistically extracted phrases length two as cation. comparison Bag Words representation, alone TC decreases ers signi cantly combining bag phrase representations may increase SVM er slightly.

参考文章(84)
Andreas Hotho, Andreas Nürnberger, Gerhard Paass, A Brief Survey of Text Mining. Ldv Forum. ,vol. 20, pp. 19- 62 ,(2005)
Yiming Yang, Seán Slattery, Rayid Ghani, A Study of Approaches to Hypertext Categorization intelligent information systems. ,vol. 18, pp. 219- 241 ,(2002) , 10.1023/A:1013685612819
Anikó Ekárt, Mario Giacobini, Anna Isabel Esparcia-Alcázar, Stefano Cagnoni, Anthony Brabazon, Muddassar Farooq, Penousal Machado, Gianni A. di Caro, Andreas Fink, Applications of Evolutionary Computing ,(2008)
Vangelis Karkaletsis, Constantine D. Spyropoulos, Georgios Paliouras, Learning rules for large vocabulary word sense disambiguation international joint conference on artificial intelligence. pp. 674- 679 ,(1999)
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)
T. Rachidi, O. Iraqi, M. Bouzoubaa, A.B. El Khattab, M. El Kourdi, A. Zahi, A. Bensaid, Barq: distributed multilingual internet search engine with focus on Arabic language systems, man and cybernetics. ,vol. 1, pp. 428- 435 ,(2003) , 10.1109/ICSMC.2003.1243853
Stefano Cagnoni, Monica Mordonini, Jonathan Sartori, Particle Swarm Optimization for Object Detection and Segmentation Proceedings of the 2007 EvoWorkshops 2007 on EvoCoMnet, EvoFIN, EvoIASP,EvoINTERACTION, EvoMUSART, EvoSTOC and EvoTransLog: Applications of Evolutionary Computing. pp. 241- 250 ,(2009) , 10.1007/978-3-540-71805-5_27