作者: Gustavo Glusman , Juan Caballero , Max Robinson , Burak Kutlu , Leroy Hood
DOI: 10.1371/JOURNAL.PONE.0077885
关键词:
摘要: Deep sequencing of transcriptomes has become an indispensable tool for biology, enabling expression levels thousands genes to be compared across multiple samples. Since transcript counts scale with depth, from different samples must normalized a common prior comparison. We analyzed fifteen existing and novel algorithms normalizing counts, evaluated the effectiveness resulting normalizations. For this purpose we defined two mutually independent metrics: (1) number “uniform” (genes whose have sufficiently low coefficient variation), (2) Spearman correlation between profiles gene pairs. also define four algorithms, one which explicitly maximizes uniform genes, performance all algorithms. The most commonly used methods (scaling fixed total value, or equalizing certain ‘housekeeping’ genes) yielded particularly poor results, surpassed even by normalization based on randomly selected sets. Conversely, seven approached what appears optimal normalization. Three these rely identification “ubiquitous” genes: expressed in studied, but never at very high levels. demonstrate that include “core” many tissues consistent pattern, is suitable use as internal guide. new yield robustly values, prerequisite differentially tissue-specific potential biomarkers.