Looking for a reference for large datasets: relative reliability of visual and automatic sleep scoring

作者: C. Berthomier , V. Muto , C. Schmidt , G. Vandewalle , M. Jaspar

DOI: 10.1101/576090

关键词: Sleep scoringKappaRelative reliabilityPairwise comparisonArtificial intelligenceHealthy individualsMachine learningComputer science

摘要: Abstract Study Objectives New challenges in sleep science require to describe fine grain phenomena or deal with large datasets. Beside the human resource challenge of scoring huge datasets, inter- and intra-expert variability may also reduce sensitivity such studies. Searching for a way disentangle induced by method from actual data, visual automatic scorings healthy individuals were examined. Methods A first dataset (DS1, 4 recordings) scored 6 experts plus an autoscoring algorithm was used characterize inter-scoring variability. second (DS2, 88 few weeks later investigate Percentage agreements Conger’s kappa derived epoch-by-epoch comparisons on pairwise, consensus majority scorings. Results On DS1 number epochs agreement decreased when expert increased, both scoring, where ranged 86% (pairwise) 69% (all experts). Adding changed value 0.81 0.79. Agreement between 93%. DS2 evidenced systematic decrease each single datasets (0.75 0.70). Conclusions Visual induces variability, which is difficult address especially big data When proven be reliable if perfectly reproducible, methods can cope intra-scorer making them sensible option dealing Statement Significance We confirmed extended previous findings highlighting intra- inter-expert scoring. those issues cannot completely addressed neither practical nor statistical solutions as group training, automated reasonably imperfect but it serve reference

参考文章(51)
Michael H. Silber, Sonia Ancoli-Israel, Michael H. Bonnet, Sudhansu Chokroverty, Madeleine M. Grigg-Damberger, Max Hirshkowitz, Sheldon Kapen, Sharon A. Keenan, Meir H. Kryger, Thomas Penzel, Mark R. Pressman, Conrad Iber, The visual scoring of sleep in adults Journal of Clinical Sleep Medicine. ,vol. 3, pp. 121- 131 ,(2007) , 10.5664/JCSM.26814
Stephen D. Pittman, Mary M. MacDonald, Robert B. Fogel, Atul Malhotra, Koby Todros, Baruch Levy, Amir B. Geva, David P. White, Assessment of Automated Scoring of Polysomnographic Recordings in a Population with Suspected Sleep-disordered Breathing Sleep. ,vol. 27, pp. 1394- 1403 ,(2004) , 10.1093/SLEEP/27.7.1394
Coralyn W. Whitney, Daniel J. Gottlieb, Susan Redline, Robert G. Norman, Russell R. Dodge, Eyal Shahar, Susan Surovec, F. Javier Nieto, Reliability of scoring respiratory disturbance indices and sleep staging. Sleep. ,vol. 21, pp. 749- 757 ,(1998) , 10.1093/SLEEP/21.7.749
Robert G Norman, Ivan Pal, Chip Stewart, Joyce A. Walsleben, David M Rapoport, Interobserver agreement among sleep scorers from different centers in a large dataset. Sleep. ,vol. 23, pp. 901- 908 ,(2000) , 10.1093/SLEEP/23.7.1E
Laura S. Castro, Dalva Poyares, Damien Leger, Lia Bittencourt, Sergio Tufik, Objective prevalence of insomnia in the São Paulo, Brazil epidemiologic sleep study Annals of Neurology. ,vol. 74, pp. 537- 546 ,(2013) , 10.1002/ANA.23945
Richard Kaplan, Ying Wang, Kenneth Loparo, Monica Kelly, Evaluation of an automated single-channel sleep staging algorithm. Nature and Science of Sleep. ,vol. 7, pp. 101- 111 ,(2015) , 10.2147/NSS.S77888
Timothy I. Morgenthaler, Ludmila Deriy, Jonathan L. Heald, Sherene M. Thomas, The Evolution of the AASM Clinical Practice Guidelines: Another Step Forward. Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine. ,vol. 12, pp. 129- 135 ,(2016) , 10.5664/JCSM.5412
HEIDI DANKER-HOPFE, PETER ANDERER, JOSEF ZEITLHOFER, MARION BOECK, HANS DORN, GEORG GRUBER, ESTHER HELLER, ERNA LORETZ, DORIS MOSER, SILVIA PARAPATICS, BERND SALETU, ANDREA SCHMIDT, GEORG DORFFNER, Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard Journal of Sleep Research. ,vol. 18, pp. 74- 84 ,(2009) , 10.1111/J.1365-2869.2008.00700.X