作者: Sumit Mukherjee , Yue Zhang , Joshua Fan , Georg Seelig , Sreeram Kannan
DOI: 10.1093/BIOINFORMATICS/BTY293
关键词: Python (programming language) 、 Source code 、 Scalability 、 Preprocessor 、 Transcriptome 、 Data mining 、 Sequence analysis 、 Visualization 、 Cluster analysis 、 Matrix decomposition 、 Computer science
摘要: Motivation Single cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As ability sequence more cells improves rapidly, existing computational tools suffer three problems. (i) The decreased reads-per-cell implies highly sparse sample true cellular transcriptome. (ii) Many simply cannot handle size resulting datasets. (iii) Prior biological knowledge such as bulk certain types or qualitative marker is not taken into account. Here we present UNCURL, preprocessing framework based on non-negative matrix factorization for scRNA-seq data, that able varying sampling distributions, scales very large numbers and can incorporate prior knowledge. Results We find using UNCURL consistently performance commonly used clustering, visualization lineage estimation, both in absence presence Finally demonstrate extremely scalable parallelizable, runs faster than other methods dataset containing 1.3 million cells. Availability implementation Source code available at https://github.com/yjzhang/uncurl_python. Supplementary are Bioinformatics online.