作者: Anisha Keshavan , Jason D. Yeatman , Ariel Rokem
DOI: 10.1101/363382
关键词:
摘要: Research in many fields has become increasingly reliant on large and complex datasets. "Big Data" holds untold promise to rapidly advance science by tackling new questions that cannot be answered with smaller While powerful, research Big Data poses unique challenges, as standard lab protocols rely experts examining each one of the samples. This is not feasible for large-scale datasets because manual approaches are time-consuming hence difficult scale. Meanwhile, automated lack accuracy examination highly trained scientists this may introduce major errors, sources noise, unforeseen biases into these Our proposed solution 1) start a small, expertly labelled dataset, 2) amplify labels through web-based tools engage citizen scientists, 3) train machine learning amplified emulate expert decision making. As proof concept, we developed system quality control dataset three-dimensional magnetic resonance images (MRI) human brains. An initial 200 brain labeled were label 722 brains, over 80,000 ratings done simple web interface. A deep algorithm was then predict data quality, based combination scientist accounts differences classification different scientists. In an ROC analysis (on left out test data), network performed well state-of-the-art, specialized (MRIQC) T1-weighted images, area under curve 0.99. Finally, specific practical application method, explore how image relates replicability established relationship between volume age development. Combining can generalize scale making; particularly important emerging disciplines where specialized, do already exist.