Geographic origin identification of coal using near-infrared spectroscopy combined with improved random forest method

作者: Meng Lei , Xinhui Yu , Ming Li , Wenxiang Zhu

DOI: 10.1016/J.INFRARED.2018.05.018

关键词: Dimension (vector space)Random forestArtificial intelligenceStability (learning theory)OversamplingRedundancy (engineering)Environmental pollutionSupport vector machineNoiseComputer sciencePattern recognition

摘要: Abstract Traditional identification methods of coal origin have the drawbacks complex operation, samples damage and environmental pollution. Near infrared spectroscopy is a new method which used to solve problems effectively. However, spectra had features high dimension, redundancy noise. Also data set was small imbalanced. Therefore, this study chose Random Forest (RF) algorithm as basic modeling algorithm. Besides, K-means introduced improve Synthetic Minority Oversampling Technique (SMOTE) overcome imbalanced set. A comparison Support Vector Machine (SVM) model, RF model improved indicated that reached an overall accuracy 97.92%, G-mean value 0.9696, average voting rate 83.09%. These results were 6.25%, 7.03%, 6.94% higher than counterparts respectively. Simultaneously, they 8.34% 5.86% SVM in G-mean. The suggested produced reliable accuracy, validity stability. Its conformed analysis coal-forming factors. Consequently, applicable identify geographic rapidly.

参考文章(17)
Wenqian Huang, Yan'an Wang, Jianhua Guo, Zhiming Guo, Chunjiang Zhao, Nondestructive Quantification of Foliar Chlorophyll in an Apple Orchard by Visible/Near-Infrared Reflectance Spectroscopy and Partial Least Squares Spectroscopy Letters. ,vol. 47, pp. 481- 487 ,(2014) , 10.1080/00387010.2013.816748
Rok Blagus, Lara Lusa, SMOTE for high-dimensional class-imbalanced data BMC Bioinformatics. ,vol. 14, pp. 106- 106 ,(2013) , 10.1186/1471-2105-14-106
Hui Chen, Zan Lin, Hegang Wu, Li Wang, Tong Wu, Chao Tan, Diagnosis of colorectal cancer by near-infrared optical fiber spectroscopy and random forest Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy. ,vol. 135, pp. 185- 191 ,(2015) , 10.1016/J.SAA.2014.07.005
Igor Melnykov, Volodymyr Melnykov, On -means algorithm with the use of Mahalanobis distances Statistics & Probability Letters. ,vol. 84, pp. 88- 95 ,(2014) , 10.1016/J.SPL.2013.09.026
Yasheng Wang, Meng Yang, Gao Wei, Ruifen Hu, Zhiyuan Luo, Guang Li, Improved PLS regression based on SVM classification for rapid analysis of coal properties by near-infrared reflectance spectroscopy Sensors and Actuators B-chemical. ,vol. 193, pp. 723- 729 ,(2014) , 10.1016/J.SNB.2013.12.028
Sri Widodo, Wolfgang Oschmann, Achim Bechtel, Reinhard F. Sachsenhofer, Komang Anggayana, Wilhelm Puettmann, Distribution of sulfur and pyrite in coal seams from Kutai Basin (East Kalimantan, Indonesia): Implications for paleoenvironmental conditions International Journal of Coal Geology. ,vol. 81, pp. 151- 162 ,(2010) , 10.1016/J.COAL.2009.12.003
Dong Won Kim, Jong Min Lee, Jae Sung Kim, Application of near infrared diffuse reflectance spectroscopy for on-line measurement of coal properties Korean Journal of Chemical Engineering. ,vol. 26, pp. 489- 495 ,(2009) , 10.1007/S11814-009-0083-0
U. Maulik, S. Bandyopadhyay, Performance evaluation of some clustering algorithms and validity indices IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 24, pp. 1650- 1654 ,(2002) , 10.1109/TPAMI.2002.1114856
Mariana Belgiu, Lucian Drăguţ, Random forest in remote sensing: A review of applications and future directions Isprs Journal of Photogrammetry and Remote Sensing. ,vol. 114, pp. 24- 31 ,(2016) , 10.1016/J.ISPRSJPRS.2016.01.011