作者: Ben D. Fulcher , Aurina Arnatkevičiūtė , Alex Fornito
DOI: 10.1101/2020.04.24.058958
关键词: Transcriptional activity 、 Set (psychology) 、 Null (SQL) 、 Biology 、 Cell density 、 Spatial analysis 、 Transcriptome 、 Gene 、 Computational biology 、 Phenotype
摘要: The recent availability of whole-brain atlases gene expression, which quantify the transcriptional activity thousands genes across many different brain regions, has opened new opportunities to understand how gene-expression patterns relate spatially varying properties structure and function. To aid interpretation a given neural phenotype, gene-set enrichment analysis (GSEA) become standard statistical methodology identify functionally related groups genes, annotated using systems such as Gene Ontology (GO), that are associated with phenotype. While GSEA identified diverse aspects function in mouse human, here we show these results affected by substantial biases. Quantifying false-positive rates individual GO categories an ensemble random phenotypic maps, found average 875-fold inflation significant findings relative expectation mouse, 582-fold some being judged for over 20% phenotypes. Concerningly, probability category reported extant literature increases its estimated rate, suggesting published reports strongly reporting bias. We bias is primarily driven within-category gene--gene coexpression spatial autocorrelation, not accounted conventional nulls, introduce flexible ensemble-based null models can account effects. Testing range structural connectivity cell density phenotypes demonstrate would conventionally be highly fact consistent ensembles Our highlight major pitfalls applying brain-wide transcriptomic data outline solutions this pervasive problem, made available open toolbox.