scSNVIndel. accurate and efficient calling of SNVs and indels from single cell sequencing using integrated Bi-LSTM

作者: Yufeng Wu , Jingyang Gao , Lei Cai

DOI: 10.1109/BIBM49941.2020.9313484

关键词:

摘要: Single-cell data are sparse and have coverage fluctuations, making it difficult, in comparison with obtained from next-generation sequencing (NGS), to call single nucleotide variants (SNVs) indels. Furthermore, most existing methods unable effectively whole-genome SNVs indels cell (SCS) data. In this study, we propose a new method for the efficient identification of SCS data, called scSNVIndel. scSNVIndel uses bidirectional long short-term memory (Bi-LSTM) as its base integrates natural language processing (NLP) technology. It automatically extracts features accurately calls when using which is characterized by uneven discontinuous coverage. Moreover, can sequence directly, retaining valuable information does not convert into an image like DeepVariant method. The results show that performs better terms accuracy recall calling variants, compared other methods. currently open-source method, available at https://github.com/CSuperlei/scSNVIndel, usage published on following website: https://www.aiguqu.com/2020/06/18/scSNVIndel/.

参考文章(25)
Jing Wang, Cheng Ling, Jingyang Gao, CNNdel: Calling Structural Variations on Low Coverage Data Based on Convolutional Neural Networks. BioMed Research International. ,vol. 2017, pp. 6375059- 6375059 ,(2017) , 10.1155/2017/6375059
Ryan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, Jojo Dijamco, Nam Nguyen, Pegah T Afshar, Sam S Gross, Lizzie Dorfman, Cory Y McLean, Mark A DePristo, A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology. ,vol. 36, pp. 983- 987 ,(2018) , 10.1038/NBT.4235
Hamid Eghbal-Zadeh, Lukas Fischer, Niko Popitsch, Florian Kromp, Sabine Taschner-Mandl, Teresa Gerber, Eva Bozsaky, Peter F. Ambros, Inge M. Ambros, Gerhard Widmer, Bernhard A. Moser, DeepSNP: An End-to-End Deep Neural Network with Attention-Based Localization for Breakpoint Detection in Single-Nucleotide Polymorphism Array Genomic Data. Journal of Computational Biology. ,vol. 26, pp. 572- 596 ,(2019) , 10.1089/CMB.2018.0172
Geoffrey Schiebinger, Jian Shu, Marcin Tabaka, Brian Cleary, Vidya Subramanian, Aryeh Solomon, Joshua Gould, Siyan Liu, Stacie Lin, Peter Berube, Lia Lee, Jenny Chen, Justin Brumbaugh, Philippe Rigollet, Konrad Hochedlinger, Rudolf Jaenisch, Aviv Regev, Eric S. Lander, Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. Cell. ,vol. 176, pp. 1517- ,(2019) , 10.1016/J.CELL.2019.01.006
Rabah Alzaidy, Cornelia Caragea, C. Lee Giles, Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents the web conference. pp. 2551- 2557 ,(2019) , 10.1145/3308558.3313642
Ashwinikumar Kulkarni, Ashley G. Anderson, Devin P. Merullo, Genevieve Konopka, Beyond bulk: a review of single cell transcriptomics methodologies and applications Current Opinion in Biotechnology. ,vol. 58, pp. 129- 136 ,(2019) , 10.1016/J.COPBIO.2019.03.001
Guixian Xu, Yueting Meng, Xiaoyu Qiu, Ziheng Yu, Xu Wu, Sentiment Analysis of Comment Texts Based on BiLSTM IEEE Access. ,vol. 7, pp. 51522- 51532 ,(2019) , 10.1109/ACCESS.2019.2909919
Kevin Grosselin, Adeline Durand, Justine Marsolier, Adeline Poitou, Elisabetta Marangoni, Fariba Nemati, Ahmed Dahmani, Sonia Lameiras, Fabien Reyal, Olivia Frenoy, Yannick Pousse, Marcel Reichen, Adam Woolfe, Colin Brenan, Andrew D. Griffiths, Céline Vallot, Annabelle Gérard, High-throughput single-cell ChIP-seq identifies heterogeneity of chromatin states in breast cancer Nature Genetics. ,vol. 51, pp. 1060- 1066 ,(2019) , 10.1038/S41588-019-0424-9
Jun Ding, Chieh Lin, Ziv Bar-Joseph, Cell lineage inference from SNP and scRNA-Seq data. Nucleic Acids Research. ,vol. 47, ,(2019) , 10.1093/NAR/GKZ146
Peijie Lin, Michael Troup, Joshua W. K. Ho, CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biology. ,vol. 18, pp. 59- 59 ,(2017) , 10.1186/S13059-017-1188-0