作者: Nathan P. Golightly , Avery Bell , Anna I. Bischoff , Parker D. Hollingsworth , Stephen R. Piccolo
关键词:
摘要: One important use of genome-wide transcriptional profiles is to identify relationships between transcription levels and patient outcomes. These translational insights can guide the development biomarkers for clinical application. Data from thousands translational-biomarker studies have been deposited in public repositories, enabling reuse. However, data-reuse efforts require considerable time expertise because data are generated using heterogeneous profiling technologies, preprocessed diverse normalization procedures, annotated non-standard ways. To address this problem, we curated 45 publicly available, datasets a variety human diseases. increase data's utility, reprocessed raw expression uniform computational pipeline, addressed quality-control problems, mapped annotations controlled vocabulary, prepared consistently structured, analysis-ready files. data, along with scripts used prepare available repository. We believe these will be particularly useful researchers seeking perform benchmarking studies—for example, compare optimize machine-learning algorithms' ability predict biomedical Machine-accessible metadata file describing reported (ISA-Tab format)