Learning from biased data using mixture models

作者: A. J. Feelders

DOI:

关键词:

摘要: Data bases sometimes contain a non-random sample from the population of interest. This complicates use extracted knowledge for predictive purposes. We consider specific type biased data that is considerable practical interest, namely partially classified data. typically results when some screening mechanism determines whether correct class particular case known. In credit scoring problem learning such called "reject inference", since label (e.g. good or bad loan) rejected loan applications unknown. show maximum likelihood estimation so mixture models appropriate this data, and discuss an experiment performed on simulated using mixtures normal components. The benefits approach are shown by making comparison with sample-based discriminant analysis. Some directions given how to extend analysis allow non-normal components missing attribute values in order make it suitable "real-life"

参考文章(11)
Trevor Hastie, Robert Tibshirani, Discriminant Analysis by Gaussian Mixtures Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 58, pp. 155- 176 ,(1996) , 10.1111/J.2517-6161.1996.TB02073.X
Geoffrey J. McLachlan, Kaye E. Basford, Mixture models : inference and applications to clustering Statistics: Textbooks and Monographs. ,vol. 84, ,(1988)
G. J. McLachlan, R. D. Gordon, Mixture models for partially unclassified data: A case study of renal venous renin in hypertension Statistics in Medicine. ,vol. 8, pp. 1291- 1300 ,(1989) , 10.1002/SIM.4780081012
C. J. Lawrence, W. J. Krzanowski, Mixture separation for mixed-mode data Statistics and Computing. ,vol. 6, pp. 85- 92 ,(1996) , 10.1007/BF00161577
Roderick JA Little, Donald B Rubin, None, Statistical Analysis with Missing Data ,(1987)
A. J. Feelders, J. W. van't Zand, A. J. F. le Loux, Data mining for loan evaluation at ABN AMRO: a case study knowledge discovery and data mining. pp. 106- 111 ,(1995)
Brian Everitt, D. J. Hand, Finite Mixture Distributions ,(1981)
Zoubin Ghahramani, Michael Jordan, None, Supervised learning from incomplete data via an EM approach neural information processing systems. ,vol. 6, pp. 120- 127 ,(1993)
U. E. Makov, D. M. Titterington, Adrian F. M. Smith, Statistical analysis of finite mixture distributions ,(1986)