Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge.

作者: Sumit Mukherjee , Yue Zhang , Joshua Fan , Georg Seelig , Sreeram Kannan

DOI: 10.1093/BIOINFORMATICS/BTY293

关键词: Python (programming language)Source codeScalabilityPreprocessorTranscriptomeData miningSequence analysisVisualizationCluster analysisMatrix decompositionComputer science

摘要: Motivation Single cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As ability sequence more cells improves rapidly, existing computational tools suffer three problems. (i) The decreased reads-per-cell implies highly sparse sample true cellular transcriptome. (ii) Many simply cannot handle size resulting datasets. (iii) Prior biological knowledge such as bulk certain types or qualitative marker is not taken into account. Here we present UNCURL, preprocessing framework based on non-negative matrix factorization for scRNA-seq data, that able varying sampling distributions, scales very large numbers and can incorporate prior knowledge. Results We find using UNCURL consistently performance commonly used clustering, visualization lineage estimation, both in absence presence Finally demonstrate extremely scalable parallelizable, runs faster than other methods dataset containing 1.3 million cells. Availability implementation Source code available at https://github.com/yjzhang/uncurl_python. Supplementary are Bioinformatics online.

参考文章(36)
Jaehoon Shin, Daniel A. Berg, Yunhua Zhu, Joseph Y. Shin, Juan Song, Michael A. Bonaguidi, Grigori Enikolopov, David W. Nauen, Kimberly M. Christian, Guo-li Ming, Hongjun Song, Single-Cell RNA-Seq with Waterfall Reveals Molecular Cascades underlying Adult Neurogenesis. Cell Stem Cell. ,vol. 17, pp. 360- 372 ,(2015) , 10.1016/J.STEM.2015.07.013
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Colin R. Blyth, On Simpson's Paradox and the Sure-Thing Principle Journal of the American Statistical Association. ,vol. 67, pp. 364- 366 ,(1972) , 10.1080/01621459.1972.10482387
A. Zeisel, A. B. Munoz-Manchado, S. Codeluppi, P. Lonnerberg, G. La Manno, A. Jureus, S. Marques, H. Munguba, L. He, C. Betsholtz, C. Rolny, G. Castelo-Branco, J. Hjerling-Leffler, S. Linnarsson, Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq Science. ,vol. 347, pp. 1138- 1142 ,(2015) , 10.1126/SCIENCE.AAA1934
Dominic Grün, Lennart Kester, Alexander van Oudenaarden, Validation of noise models for single-cell transcriptomics Nature Methods. ,vol. 11, pp. 637- 640 ,(2014) , 10.1038/NMETH.2930
Cole Trapnell, Davide Cacchiarelli, Jonna Grimsby, Prapti Pokharel, Shuqiang Li, Michael Morse, Niall J Lennon, Kenneth J Livak, Tarjei S Mikkelsen, John L Rinn, The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells Nature Biotechnology. ,vol. 32, pp. 381- 386 ,(2014) , 10.1038/NBT.2859
C. Boutsidis, E. Gallopoulos, SVD based initialization: A head start for nonnegative matrix factorization Pattern Recognition. ,vol. 41, pp. 1350- 1362 ,(2008) , 10.1016/J.PATCOG.2007.09.010
Dmitry Usoskin, Alessandro Furlan, Saiful Islam, Hind Abdo, Peter Lönnerberg, Daohua Lou, Jens Hjerling-Leffler, Jesper Haeggström, Olga Kharchenko, Peter V Kharchenko, Sten Linnarsson, Patrik Ernfors, Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing Nature Neuroscience. ,vol. 18, pp. 145- 153 ,(2015) , 10.1038/NN.3881
Sam T Roweis, Lawrence K Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding Science. ,vol. 290, pp. 2323- 2326 ,(2000) , 10.1126/SCIENCE.290.5500.2323
Ye Zhang, Kenian Chen, Steven A Sloan, Mariko L Bennett, Anja R Scholze, Sean O'Keeffe, Hemali P Phatnani, Paolo Guarnieri, Christine Caneda, Nadine Ruderisch, Shuyun Deng, Shane A Liddelow, Chaolin Zhang, Richard Daneman, Tom Maniatis, Ben A Barres, Jia Qian Wu, An RNA-Sequencing Transcriptome and Splicing Database of Glia, Neurons, and Vascular Cells of the Cerebral Cortex The Journal of Neuroscience. ,vol. 34, pp. 11929- 11947 ,(2014) , 10.1523/JNEUROSCI.1860-14.2014