Exploration of Analysis Methods for Diagnostic Imaging Tests: Problems with ROC AUC and Confidence Scores in CT Colonography

作者: Susan Mallett , Steve Halligan , Gary S. Collins , Doug G. Altman

DOI: 10.1371/JOURNAL.PONE.0107633

关键词:

摘要: Background: Different methods of evaluating diagnostic performance when comparing tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for evaluation CT colonography detection polyps, either or without computer assisted detection. Methods: In a multireader multicase study 10 readers 107 cases we specificity, using radiological reporting presence absence ROC AUC calculated from confidence scores concerning polyps. Both were assessed against reference standard. Here focus on five readers, selected illustrate issues in design analysis. measures within showing that differences results are due statistical methods. Results: Reader varied widely depending whether was used. There problems scores; assigning all cases; use zero no polyps identified; bimodal non-normal distribution fitting curves extrapolation beyond data; undue influence few false positive Variation exceeded between test AUC. Conclusions: The recorded our violated many assumptions methods, rendering these inappropriate. identified will apply other studies scores. found more reliable clinically appropriate method compare tests.

参考文章(44)
Nancy A. Obuchowski, Brandon D. Gallas, Stephen L. Hillis, Multi-reader ROC Studies with Split-plot Designs: A Comparison of Statistical Methods Academic Radiology. ,vol. 19, pp. 1508- 1517 ,(2012) , 10.1016/J.ACRA.2012.09.012
Wojtek J. Krzanowski, David J. Hand, ROC Curves for Continuous Data ,(2009)
Robert M. Centor, Signal detectability: the use of ROC curves and their analyses. Medical Decision Making. ,vol. 11, pp. 102- 106 ,(1991) , 10.1177/0272989X9101100205
David Gur, Howard E. Rockette, Andriy I. Bandos, "Binary" and "non-binary" detection tasks: are current performance measures optimal? Academic Radiology. ,vol. 14, pp. 871- 876 ,(2007) , 10.1016/J.ACRA.2007.03.014
Niall M Adams, David J Hand, An improved measure for comparing diagnostic tests. Computers in Biology and Medicine. ,vol. 30, pp. 89- 96 ,(2000) , 10.1016/S0010-4825(99)00025-6
James A. Hanley, The Robustness of the "Binormal" Assumptions Used in Fitting ROC Curves Medical Decision Making. ,vol. 8, pp. 197- 203 ,(1988) , 10.1177/0272989X8800800308