作者: A.A. Mohamed
DOI: 10.1016/J.EIJ.2019.05.002
关键词: Algorithm 、 Effective dimension 、 Dimensionality reduction 、 Computer science 、 Curse of dimensionality 、 Document clustering 、 Non-negative matrix factorization 、 Singular value decomposition 、 Principal component analysis 、 Cluster analysis
摘要: Abstract Text clustering is a challenging task in natural language processing due to the very high dimensional space produced by this process (i.e. curse of dimensionality problem). Since these texts contain considerable amounts ambiguities and redundancies, they produce different noise effects. For an efficient accurate algorithm, we need extract main concepts text eliminating reducing data. This paper compares among three famous dimension reduction algorithms for show pros cons each one, namely Principal Component Analysis (PCA), Nonnegative Matrix Factorization (NMF) Singular Value Decomposition (SVD). It presents effective algorithm Arabic using PCA. that purpose, series experiments has been conducted two linguistic corpora both English analyzed results from quality point view. The have shown PCA improves it gives more interpretable with less time needed documents.