Factors affecting intercoder reliability: a Monte Carlo experiment

作者: Guangchao Charles Feng

DOI: 10.1007/S11135-012-9745-9

关键词: Inter-rater reliabilityStatisticsPsychologyMonte Carlo methodResponse ParametersReliability (statistics)Practical implicationsEconometrics

摘要: Although it has long been a consensus that intercoder reliability is crucial to the validity of content analysis study, choice among them debated. This study reviewed and empirically tested most popular indices, aiming find robust index against prevalence rater bias, by testing their relationships with response surface methodology through Monte Carlo experiment. It was found Maxwell’s R.E superior Krippendorff’s α, Scott’s π, Cohen’s κ, I r Perreault Leigh, Gwet’s AC 1. More nuanced prevalence, sensitivity, specificity indices were discovered plots. Both theoretical practical implications also discussed in end.

参考文章(89)
Alan Agresti, Modelling patterns of agreement and disagreement. Statistical Methods in Medical Research. ,vol. 1, pp. 201- 218 ,(1992) , 10.1177/096228029200100205
Alan Agresti, Atalanta Ghosh, Matilde Bini, None, Raking Kappa: Describing Potential Impact of Marginal Distributions on Measures of Agreement Biometrical Journal. ,vol. 37, pp. 811- 820 ,(1995) , 10.1002/BIMJ.4710370705
Alexander von Eye, Maxine von Eye, On the Marginal Dependency of Cohen’s κ European Psychologist. ,vol. 13, pp. 305- 315 ,(2008) , 10.1027/1016-9040.13.4.305
William A. Scott, Reliability of Content Analysis: The Case of Nominal Scale Coding Public Opinion Quarterly. ,vol. 19, pp. 321- 325 ,(1955) , 10.1086/266577
Patrick E. Shrout, Joseph L. Fleiss, Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin. ,vol. 86, pp. 420- 428 ,(1979) , 10.1037/0033-2909.86.2.420
Ron Artstein, Massimo Poesio, Inter-coder agreement for computational linguistics Computational Linguistics. ,vol. 34, pp. 555- 596 ,(2008) , 10.1162/COLI.07-034-R2
Irene Guggenmoos-Holzmann, The meaning of kappa: Probabilistic concepts of reliability and validity revisited Journal of Clinical Epidemiology. ,vol. 49, pp. 775- 782 ,(1996) , 10.1016/0895-4356(96)00011-X
M. E. Dewey, Coefficients of agreement. British Journal of Psychiatry. ,vol. 143, pp. 487- 489 ,(1983) , 10.1192/BJP.143.5.487
A. E. Maxwell, Comparing the Classification of Subjects by Two Independent Judges British Journal of Psychiatry. ,vol. 116, pp. 651- 655 ,(1970) , 10.1192/BJP.116.535.651