作者: C. Berthomier , V. Muto , C. Schmidt , G. Vandewalle , M. Jaspar
DOI: 10.1101/576090
关键词: Sleep scoring 、 Kappa 、 Relative reliability 、 Pairwise comparison 、 Artificial intelligence 、 Healthy individuals 、 Machine learning 、 Computer science
摘要: Abstract Study Objectives New challenges in sleep science require to describe fine grain phenomena or deal with large datasets. Beside the human resource challenge of scoring huge datasets, inter- and intra-expert variability may also reduce sensitivity such studies. Searching for a way disentangle induced by method from actual data, visual automatic scorings healthy individuals were examined. Methods A first dataset (DS1, 4 recordings) scored 6 experts plus an autoscoring algorithm was used characterize inter-scoring variability. second (DS2, 88 few weeks later investigate Percentage agreements Conger’s kappa derived epoch-by-epoch comparisons on pairwise, consensus majority scorings. Results On DS1 number epochs agreement decreased when expert increased, both scoring, where ranged 86% (pairwise) 69% (all experts). Adding changed value 0.81 0.79. Agreement between 93%. DS2 evidenced systematic decrease each single datasets (0.75 0.70). Conclusions Visual induces variability, which is difficult address especially big data When proven be reliable if perfectly reproducible, methods can cope intra-scorer making them sensible option dealing Statement Significance We confirmed extended previous findings highlighting intra- inter-expert scoring. those issues cannot completely addressed neither practical nor statistical solutions as group training, automated reasonably imperfect but it serve reference