作者: Mohamed Ahmed Rashad , Hesham El-Deeb , Mohamed Waleed Fakhr
DOI: 10.1007/978-3-319-06764-3_27
关键词: Artificial intelligence 、 Cluster analysis 、 Document clustering 、 Canopy clustering algorithm 、 Correlation clustering 、 k-means clustering 、 CURE data clustering algorithm 、 Pattern recognition 、 Computer science 、 Data stream clustering 、 Document classification
摘要: Automated document clustering is an important text mining task especially with the rapid growth of number online documents present in Arabic language. Text aims to automatically assign a predefined cluster based on linguistic features. This research proposes enhanced grid algorithm. The main purpose this algorithm divide data space into clusters arbitrary shape. These are considered as dense regions points that separated by low density representing noise. Also it deals making set multi-densities and assigning noise outliers closest category. will reduce time complexity. Unclassified preprocessed removing stops words extracting word root used dimensionality feature vectors documents. Each then represented vector their frequencies. accuracy presented according consumption percentage successfully clustered instances. results experiments were carried out in-house collected have proven its effectiveness average 89 %.