作者: Tehseen Zia , , Qaiser Abbas , Muhammad Pervez Akhtar
关键词: Artificial intelligence 、 Feature selection 、 Information gain ratio 、 Support vector machine 、 Feature (computer vision) 、 Model selection 、 Statistical classification 、 C4.5 algorithm 、 Computer science 、 Pattern recognition 、 Decision tree
摘要: Efficient feature selection is an important phase of designing effective text categorization system. Various methods have been proposed for selecting dissimilar sets. It often essential to evaluate that which method more a given task and what size set model choice. Aim this paper answer these questions Urdu Five widely used were examined using six well-known classification algorithms: naive Bays (NB), k-nearest neighbor (KNN), support vector machines (SVM) with linear, polynomial radial basis kernels decision tree (i.e. J48). The study was conducted over two test collections: EMILLE collection collection. We observed three i.e. information gain, Chi statistics, symmetrical uncertain, performed uniformly in most the cases if not all. Moreover, we found no single best all classifiers. While gain ratio out-performed others J48, has shown top performance KNN SVM kernels. Overall, linear any including statistics or symmetric uncertain turned-out be first choice across other combinations classifiers on moderate On hand, its advantage small sized corpus.