Reliability studies of diagnostic tests are not using enough observers for robust estimation of interobserver agreement: a simulation study

作者: Mohsen Sadatsafavi , Mehdi Najafzadeh , Larry Lynd , Carlo Marra

DOI: 10.1016/J.JCLINEPI.2007.10.023

关键词:

摘要: Abstract Objective Any attempt to generalize the performance of a subjective diagnostic method should take into account sample variation in both cases and readers. Most current measures test, especially indices reliability, only tackle cases, hence are not suitable for generalizing results across population We attempted study effect readers' on two multireader reliability: pair-wise agreement Fleiss' kappa. Study Design Setting used normal hierarchical model with latent trait (signal) variable simulate binary decision-making task by different number readers an infinite cases. Results It could be shown that measures, kappa, have large variance when estimated small readers, casting doubt their accuracy given typically reliability studies. Conclusion The majority studies is likely limited unlikely produce reliable estimate reader agreement.

参考文章(24)
David Gur, Howard E. Rockette, Glenn S. Maitz, Jill L. King, Amy H. Klym, Andriy I. Bandos, Variability in Observer Performance Studies Academic Radiology. ,vol. 12, pp. 1527- 1533 ,(2005) , 10.1016/J.ACRA.2005.08.010
Lee B. Lusted, Introduction to medical decision making Charles C. Thomas. ,(1968)
D G Altman, J M Bland, Diagnostic tests. 1: Sensitivity and specificity. BMJ. ,vol. 308, pp. 1552- 1552 ,(1994) , 10.1136/BMJ.308.6943.1552
WILLIAM A. GROVE, Statistical Methods for Rates and Proportions, 2nd ed American Journal of Psychiatry. ,vol. 138, ,(1981) , 10.1176/AJP.138.12.1644-A
John S. Uebersax, Validity inferences from interobserver agreement. Psychological Bulletin. ,vol. 104, pp. 405- 416 ,(1988) , 10.1037/0033-2909.104.3.405
Joseph L. Fleiss, Measuring nominal scale agreement among many raters. Psychological Bulletin. ,vol. 76, pp. 378- 382 ,(1971) , 10.1037/H0031619
Kenneth J. Berry, Paul W. Mielke, A Generalization of Cohen's Kappa Agreement Measure to Interval Measurement and Multiple Raters Educational and Psychological Measurement. ,vol. 48, pp. 921- 933 ,(1988) , 10.1177/0013164488484007
Cheryl A. Roe, Charles E. Metz, Variance-component modeling in the analysis of receiver operating characteristic index estimates Academic Radiology. ,vol. 4, pp. 587- 600 ,(1997) , 10.1016/S1076-6332(97)80210-3