作者: Meng Lei , Xinhui Yu , Ming Li , Wenxiang Zhu
DOI: 10.1016/J.INFRARED.2018.05.018
关键词: Dimension (vector space) 、 Random forest 、 Artificial intelligence 、 Stability (learning theory) 、 Oversampling 、 Redundancy (engineering) 、 Environmental pollution 、 Support vector machine 、 Noise 、 Computer science 、 Pattern recognition
摘要: Abstract Traditional identification methods of coal origin have the drawbacks complex operation, samples damage and environmental pollution. Near infrared spectroscopy is a new method which used to solve problems effectively. However, spectra had features high dimension, redundancy noise. Also data set was small imbalanced. Therefore, this study chose Random Forest (RF) algorithm as basic modeling algorithm. Besides, K-means introduced improve Synthetic Minority Oversampling Technique (SMOTE) overcome imbalanced set. A comparison Support Vector Machine (SVM) model, RF model improved indicated that reached an overall accuracy 97.92%, G-mean value 0.9696, average voting rate 83.09%. These results were 6.25%, 7.03%, 6.94% higher than counterparts respectively. Simultaneously, they 8.34% 5.86% SVM in G-mean. The suggested produced reliable accuracy, validity stability. Its conformed analysis coal-forming factors. Consequently, applicable identify geographic rapidly.