The Problem with Kappa

作者: David Martin Ward Powers , None

DOI:

关键词: KappaContext (language use)Fleiss' kappaStatisticsComputer scienceSet (abstract data type)SkewReceiver operating characteristicCohen's kappaAlgorithmComputational linguistics

摘要: It is becoming clear that traditional evaluation measures used in Computational Linguistics (including Error Rates, Accuracy, Recall, Precision and F-measure) are of limited value for unbiased systems, not meaningful comparison algorithms unless both the dataset algorithm parameters strictly controlled skew (Prevalence Bias). The use techniques originally designed other purposes, particular Receiver Operating Characteristics Area Under Curve, plus variants Kappa, have been proposed to fill void. This paper aims up some confusion relating evaluation, by demonstrating usefulness each method highly dependent on assumptions made about distributions underlying populations. behaviour a number compared under common assumptions. Deploying system context which has opposite from its validation set can be expected approximately negate Fleiss Kappa halve Cohen but leave Powers unchanged. For most performance latter thus appropriate, whilst behaviour, Matthews Correlation recommended.

参考文章(44)
Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing ,(1999)
Uzay Kaymak, Arie Ben-David, Rob Potharst, AUK: a simple alternative to the AUC ERIM report series research in management Erasmus Research Institute of Management. ,(2010)
WILLIAM A. GROVE, Statistical Methods for Rates and Proportions, 2nd ed American Journal of Psychiatry. ,vol. 138, ,(1981) , 10.1176/AJP.138.12.1644-A
David R. Shanks, Is human learning rational Quarterly Journal of Experimental Psychology. ,vol. 48, pp. 257- 279 ,(1995) , 10.1080/14640749508401390
Holly Skodol Wilson, Research in nursing ,(1985)
P.J.G Lisboa, A Vellido, H Wong, Bias reduction in skewed binary classification with Bayesian neural networks Neural Networks. ,vol. 13, pp. 407- 410 ,(2000) , 10.1016/S0893-6080(00)00022-8
Johannes F�rnkranz, Peter A. Flach, ROC 'n' rule learning: towards a better understanding of covering algorithms Machine Learning. ,vol. 58, pp. 39- 77 ,(2005) , 10.1007/S10994-005-5011-X
K. Pearson, D. Heron, On Theories of Association Biometrika. ,vol. 9, pp. 159- 315 ,(1913) , 10.1093/BIOMET/9.1-2.159
Klaus Krippendorff, Estimating the Reliability, Systematic Error and Random Error of Interval Data Educational and Psychological Measurement. ,vol. 30, pp. 61- 70 ,(1970) , 10.1177/001316447003000105