An effective dimension reduction algorithm for clustering Arabic text

作者: A.A. Mohamed

DOI: 10.1016/J.EIJ.2019.05.002

关键词: AlgorithmEffective dimensionDimensionality reductionComputer scienceCurse of dimensionalityDocument clusteringNon-negative matrix factorizationSingular value decompositionPrincipal component analysisCluster analysis

摘要: Abstract Text clustering is a challenging task in natural language processing due to the very high dimensional space produced by this process (i.e. curse of dimensionality problem). Since these texts contain considerable amounts ambiguities and redundancies, they produce different noise effects. For an efficient accurate algorithm, we need extract main concepts text eliminating reducing data. This paper compares among three famous dimension reduction algorithms for show pros cons each one, namely Principal Component Analysis (PCA), Nonnegative Matrix Factorization (NMF) Singular Value Decomposition (SVD). It presents effective algorithm Arabic using PCA. that purpose, series experiments has been conducted two linguistic corpora both English analyzed results from quality point view. The have shown PCA improves it gives more interpretable with less time needed documents.

参考文章(26)
Leah S. Larkey, Lisa Ballesteros, Margaret E. Connell, Light Stemming for Arabic Information Retrieval Springer, Dordrecht. pp. 221- 243 ,(2007) , 10.1007/978-1-4020-6046-5_12
Da Kuang, Jaegul Choo, Haesun Park, Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering Partitional Clustering Algorithms. ,vol. 1, pp. 215- 243 ,(2015) , 10.1007/978-3-319-09259-1_7
Charu C. Aggarwal, ChengXiang Zhai, A Survey of Text Clustering Algorithms Mining Text Data. pp. 77- 128 ,(2012) , 10.1007/978-1-4614-3223-4_4
Nicholas O. Andrews, Edward A. Fox, Recent Developments in Document Clustering Department of Computer Science, Virginia Polytechnic Institute & State University. ,(2007)
Ehsan Hosseini-Asl, Jacek M. Zurada, Nonnegative Matrix Factorization for Document Clustering: A Survey Artificial Intelligence and Soft Computing. pp. 726- 737 ,(2014) , 10.1007/978-3-319-07176-3_63
Huan Liu, Hiroshi Motoda, None, Computational Methods of Feature Selection Chapman and Hall/CRC. ,(2007) , 10.1201/9781584888796
Hanan Alghamdi, Ali Selamat, Topic Modelling Used to Improve Arabic Web Pages Clustering international conference on cloud computing. pp. 1- 6 ,(2015) , 10.1109/CLOUDCOMP.2015.7149662
Olatz Arbelaitz, Ibai Gurrutxaga, Javier Muguerza, Jesús M. Pérez, Iñigo Perona, An extensive comparative study of cluster validity indices Pattern Recognition. ,vol. 46, pp. 243- 256 ,(2013) , 10.1016/J.PATCOG.2012.07.021
Catherine Combes, Jean Azema, Clustering using principal component analysis applied to autonomy-disability of elderly people decision support systems. ,vol. 55, pp. 578- 586 ,(2013) , 10.1016/J.DSS.2012.10.016