Reliability studies of diagnostic tests are not using enough observers for robust estimation of interobserver agreement: a simulation study

作者： Mohsen Sadatsafavi , Mehdi Najafzadeh , Larry Lynd , Carlo Marra

DOI: 10.1016/J.JCLINEPI.2007.10.023

关键词:

摘要: Abstract Objective Any attempt to generalize the performance of a subjective diagnostic method should take into account sample variation in both cases and readers. Most current measures test, especially indices reliability, only tackle cases, hence are not suitable for generalizing results across population We attempted study effect readers' on two multireader reliability: pair-wise agreement Fleiss' kappa. Study Design Setting used normal hierarchical model with latent trait (signal) variable simulate binary decision-making task by different number readers an infinite cases. Results It could be shown that measures, kappa, have large variance when estimated small readers, casting doubt their accuracy given typically reliability studies. Conclusion The majority studies is likely limited unlikely produce reliable estimate reader agreement.

nih.gov 本地加速

europepmc.org 本地加速

sciencedirect.com 本地加速

elsevier.com 本地加速

sci-hub.se PDF 下载加速

参考文章(24)

David Gur, Howard E. Rockette, Glenn S. Maitz, Jill L. King, Amy H. Klym, Andriy I. Bandos, Variability in Observer Performance Studies Academic Radiology. ,vol. 12, pp. 1527- 1533 ,(2005) , 10.1016/J.ACRA.2005.08.010

Lee B. Lusted, Introduction to medical decision making Charles C. Thomas. ,(1968)

John Arthur Swets, Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers ,(1996)

D G Altman, J M Bland, Diagnostic tests. 1: Sensitivity and specificity. BMJ. ,vol. 308, pp. 1552- 1552 ,(1994) , 10.1136/BMJ.308.6943.1552

WILLIAM A. GROVE, Statistical Methods for Rates and Proportions, 2nd ed American Journal of Psychiatry. ,vol. 138, ,(1981) , 10.1176/AJP.138.12.1644-A

John S. Uebersax, Validity inferences from interobserver agreement. Psychological Bulletin. ,vol. 104, pp. 405- 416 ,(1988) , 10.1037/0033-2909.104.3.405

Joseph L. Fleiss, Measuring nominal scale agreement among many raters. Psychological Bulletin. ,vol. 76, pp. 378- 382 ,(1971) , 10.1037/H0031619

Charles E. Metz, Jong-Her Shen, Gains in accuracy from replicated readings of diagnostic images: prediction and assessment in terms of ROC analysis. Medical Decision Making. ,vol. 12, pp. 60- 75 ,(1992) , 10.1177/0272989X9201200110

Kenneth J. Berry, Paul W. Mielke, A Generalization of Cohen's Kappa Agreement Measure to Interval Measurement and Multiple Raters Educational and Psychological Measurement. ,vol. 48, pp. 921- 933 ,(1988) , 10.1177/0013164488484007

10.

Cheryl A. Roe, Charles E. Metz, Variance-component modeling in the analysis of receiver operating characteristic index estimates Academic Radiology. ,vol. 4, pp. 587- 600 ,(1997) , 10.1016/S1076-6332(97)80210-3

Reliability studies of diagnostic tests are not using enough observers for robust estimation of interobserver agreement: a simulation study

来源期刊

我的账户

Reliability studies of diagnostic tests are not using enough observers for robust estimation of interobserver agreement: a simulation study

来源期刊

相似文章 10

我的账户