Linear models for cost-sensitive classification

作者: Parag C. Pendharkar

DOI: 10.1111/EXSY.12114

关键词:

摘要: In this paper, we investigate the performance of statistical, mathematical programming and heuristic linear models for cost-sensitive classification. particular, use five techniques including Fisher's discriminant analysis DA, asymmetric misclassification cost mixed integer AMC-MIP, support vector machine CS-SVM, a hybrid SVMIP genetic algorithm CGA techniques. Using simulated datasets varying group overlaps, data distributions class biases, real-world from financial medical domains, compare performances our based on overall holdout sample cost. The results experiments indicate that when overlap is low distribution exponential, DA appears to provide superior performance. For all other situations with datasets, CS-SVM provides case domain, AMC-MIP hold slight edge over two SVM-based classifiers. However, domains continuous discrete attributes, SVM classifiers perform better than model most computationally inefficient poor performing model.

参考文章(23)
Gerald J. Lieberman, Frederick S. Hillier, Introduction to Operations Research and Revised CD-ROM 8 Introduction to Operations Research and Revised CD-ROM 8. ,(2005)
A. Pedro Duarte Silva, Antonie Stam, A mixed integer programming algorithm for minimizing the training sample misclassification cost in two-group classification Annals of Operations Research. ,vol. 74, pp. 129- 157 ,(1997) , 10.1023/A:1018962102794
Ulf Brefeld, Peter Geibel, Fritz Wysotzki, Support vector machines with example dependent costs european conference on machine learning. pp. 23- 34 ,(2003) , 10.1007/978-3-540-39857-8_5
J. M. Liittschwager, C. Wang, Integer Programming Solution of a Classification Problem Management Science. ,vol. 24, pp. 1515- 1525 ,(1978) , 10.1287/MNSC.24.14.1515
Zou Peng, Hao Yuanyuan, The method for solving two types of errors in customer segmentation on unbalanced data Proceedings of the 10th international conference on Electronic commerce - ICEC '08. pp. 16- ,(2008) , 10.1145/1409540.1409562
Parag C. Pendharkar, Marvin D. Troutt, DEA based dimensionality reduction for classification problems satisfying strict non-satiety assumption European Journal of Operational Research. ,vol. 212, pp. 155- 163 ,(2011) , 10.1016/J.EJOR.2011.01.037
Gang Xu, Lazaros G. Papageorgiou, A mixed integer optimisation model for data classification Computers & Industrial Engineering. ,vol. 56, pp. 1205- 1215 ,(2009) , 10.1016/J.CIE.2008.07.012
Vijay S. Mookerjee, Brian L. Dos Santos, Inductive Expert System Design: Maximizing System Value Information Systems Research. ,vol. 4, pp. 111- 140 ,(1993) , 10.1287/ISRE.4.2.111
Sudhir Nanda, Parag Pendharkar, Linear models for minimizing misclassification costs in bankruptcy prediction International Journal of Intelligent Systems in Accounting, Finance & Management. ,vol. 10, pp. 155- 168 ,(2001) , 10.1002/ISAF.203