摘要: A long-running issue in scorecard construction is how to handle dramatically unbalanced class sizes. This important because, many applications, the sizes are very different. For example, it common find that 'bad' customers constitute less than 10% of customer base and even more extreme situations often arise: Brause et al (1999) remark their database credit card transactions ‘the probability fraud low (0.2%) has been lowered a preprocessing step by conventional detecting system down 0.1%,' while Hassibi (2000) comments ‘out some 12 billion made annually, approximately 10 million – or one out every 1200 turn be fraudulent. Also, 0.04% (4 10,000) all monthly active accounts fraudulent.’ In coping with classes, there two issues considered. Firstly, what performance criterion appropriate? And, secondly, should constructed, any parameters estimated, from such data? We look at each these problems. first problem, we illustrate effect marked lack balance on criteria, demonstrating easy misled. The means simple error counts inappropriate as criteria. Rather, misclassifications smaller must regarded serious converse: different costs adopted for kinds misclassification. examine implications this. second, both classical linear scorecards powerful knearest-neighbour nonparametric methods, used detection. case (and, generally, parametric form) improved classification accuracy achieved focusing particular parts data space, relevant being implied relative misclassification costs. describe new tool constructing which takes this fact into account. k-nearest-neighbour draw attention phenomenon believe not previously reported, an choice k. methods using large set unsecured personal loan data.