Document Classification Using Enhanced Grid Based Clustering Algorithm

作者: Mohamed Ahmed Rashad , Hesham El-Deeb , Mohamed Waleed Fakhr

DOI: 10.1007/978-3-319-06764-3_27

关键词: Artificial intelligenceCluster analysisDocument clusteringCanopy clustering algorithmCorrelation clusteringk-means clusteringCURE data clustering algorithmPattern recognitionComputer scienceData stream clusteringDocument classification

摘要: Automated document clustering is an important text mining task especially with the rapid growth of number online documents present in Arabic language. Text aims to automatically assign a predefined cluster based on linguistic features. This research proposes enhanced grid algorithm. The main purpose this algorithm divide data space into clusters arbitrary shape. These are considered as dense regions points that separated by low density representing noise. Also it deals making set multi-densities and assigning noise outliers closest category. will reduce time complexity. Unclassified preprocessed removing stops words extracting word root used dimensionality feature vectors documents. Each then represented vector their frequencies. accuracy presented according consumption percentage successfully clustered instances. results experiments were carried out in-house collected have proven its effectiveness average 89 %.

参考文章(11)
Hasan Muaidi Al-Serhan, G. Kannan, R. Al Shalabi, New approach for extracting Arabic roots ,(2003)
Jian Li, Wei Yu, Bao-Ping Yan, Memory effect in DBSCAN algorithm international conference on computer science and education. pp. 31- 36 ,(2009) , 10.1109/ICCSE.2009.5228532
Mahmud S.Alkoffash, Automatic Arabic Text Clustering using K-means and K-mediods International Journal of Computer Applications. ,vol. 51, pp. 5- 8 ,(2012) , 10.5120/8012-0675
Osama A.Ghanem, Wesam M. Ashour, Stemming Effectiveness in Clustering of Arabic Documents International Journal of Computer Applications. ,vol. 49, pp. 1- 6 ,(2012) , 10.5120/7620-0674
J. Hencil Peter, A. Antonysamy, An Optimised Density Based Clustering Algorithm International Journal of Computer Applications. ,vol. 6, pp. 16- 19 ,(2010) , 10.5120/1102-1445
Raghuvira Pratap, K Suvarna, J Rama, Dr.K Nageswara, An Efficient Density based Improved K- Medoids Clustering algorithm International Journal of Advanced Computer Science and Applications. ,vol. 2, ,(2011) , 10.14569/IJACSA.2011.020607
A Anil Kumar, S Chandrasekhar, None, Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering International journal of engineering research and technology. ,vol. 1, ,(2012)
Priyanka Thrikha, Singh Vijendra, None, Fast Density Based Clustering Algorithm International Journal of Machine Learning and Computing. pp. 10- 12 ,(2013) , 10.7763/IJMLC.2013.V3.262