An effective dimension reduction algorithm for clustering Arabic text

DOI: 10.1016/J.EIJ.2019.05.002

关键词: Algorithm 、 Effective dimension 、 Dimensionality reduction 、 Computer science 、 Curse of dimensionality 、 Document clustering 、 Non-negative matrix factorization 、 Singular value decomposition 、 Principal component analysis 、 Cluster analysis

摘要: Abstract Text clustering is a challenging task in natural language processing due to the very high dimensional space produced by this process (i.e. curse of dimensionality problem). Since these texts contain considerable amounts ambiguities and redundancies, they produce different noise effects. For an efficient accurate algorithm, we need extract main concepts text eliminating reducing data. This paper compares among three famous dimension reduction algorithms for show pros cons each one, namely Principal Component Analysis (PCA), Nonnegative Matrix Factorization (NMF) Singular Value Decomposition (SVD). It presents effective algorithm Arabic using PCA. that purpose, series experiments has been conducted two linguistic corpora both English analyzed results from quality point view. The have shown PCA improves it gives more interpretable with less time needed documents.

sciencedirect.com 本地加速

sci-hub.se PDF 下载加速

参考文章(26)

Leah S. Larkey, Lisa Ballesteros, Margaret E. Connell, Light Stemming for Arabic Information Retrieval Springer, Dordrecht. pp. 221- 243 ,(2007) , 10.1007/978-1-4020-6046-5_12

Da Kuang, Jaegul Choo, Haesun Park, Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering Partitional Clustering Algorithms. ,vol. 1, pp. 215- 243 ,(2015) , 10.1007/978-3-319-09259-1_7

Charu C. Aggarwal, ChengXiang Zhai, A Survey of Text Clustering Algorithms Mining Text Data. pp. 77- 128 ,(2012) , 10.1007/978-1-4614-3223-4_4

Nicholas O. Andrews, Edward A. Fox, Recent Developments in Document Clustering Department of Computer Science, Virginia Polytechnic Institute & State University. ,(2007)

Ehsan Hosseini-Asl, Jacek M. Zurada, Nonnegative Matrix Factorization for Document Clustering: A Survey Artificial Intelligence and Soft Computing. pp. 726- 737 ,(2014) , 10.1007/978-3-319-07176-3_63

Huan Liu, Hiroshi Motoda, None, Computational Methods of Feature Selection Chapman and Hall/CRC. ,(2007) , 10.1201/9781584888796

Hanan Alghamdi, Ali Selamat, Topic Modelling Used to Improve Arabic Web Pages Clustering international conference on cloud computing. pp. 1- 6 ,(2015) , 10.1109/CLOUDCOMP.2015.7149662

Olatz Arbelaitz, Ibai Gurrutxaga, Javier Muguerza, Jesús M. Pérez, Iñigo Perona, An extensive comparative study of cluster validity indices Pattern Recognition. ,vol. 46, pp. 243- 256 ,(2013) , 10.1016/J.PATCOG.2012.07.021

Catherine Combes, Jean Azema, Clustering using principal component analysis applied to autonomy-disability of elderly people decision support systems. ,vol. 55, pp. 578- 586 ,(2013) , 10.1016/J.DSS.2012.10.016

10.

A.A. Mohamed, An improved algorithm for information hiding based on features of Arabic text: A Unicode approach Egyptian Informatics Journal. ,vol. 15, pp. 79- 87 ,(2014) , 10.1016/J.EIJ.2014.04.002

An effective dimension reduction algorithm for clustering Arabic text

来源期刊

我的账户

An effective dimension reduction algorithm for clustering Arabic text

来源期刊

相似文章 7

High dimensional document classification using novel similarity function

Comparative Analysis on Dimension Reduction Algorithm of Principal Component Analysis and Singular Value Decomposition for Clustering

Three Representations for Set Partitions

ResNet Autoencoders for Unsupervised Feature Learning From High-Dimensional Data: Deep Models Resistant to Performance Degradation

Condition monitoring systems : a systematic literature review on machine-learning methods improving offshore-wind turbine operational management

CaPBug-A Framework for Automatic Bug Categorization and Prioritization Using NLP and Machine Learning Algorithms

Clustering activity at Mt Etna based on volcanic tremor: A case study

我的账户