RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data

作者: Qian Zhou , Xiaoquan Su , Gongchao Jing , Songlin Chen , Kang Ning

DOI: 10.1186/S12864-018-4503-6

关键词: Data miningUsabilityBiologyDeep sequencingPipeline (software)Flexibility (engineering)InterpretabilityTrimmingRaw dataSuite

摘要: RNA-Seq has become one of the most widely used applications based on next-generation sequencing technology. However, raw data may have quality issues, which can significantly distort analytical results and lead to erroneous conclusions. Therefore, must be subjected vigorous control (QC) procedures before downstream analysis. Currently, an accurate complete QC requires a suite different tools consecutively, is inefficient in terms usability, running time, file usage, interpretability results. We developed comprehensive, fast easy-to-use pipeline for data, RNA-QC-Chain, involves three steps: (1) sequencing-quality assessment trimming; (2) internal (ribosomal RNAs) external (reads from foreign species) contamination filtering; (3) alignment statistics reporting (such as read number, coverage, depth pair-end mapping information). This package was our previously reported tool general (NGS) called QC-Chain, with extensions specifically designed data. It several features that are not available yet other such RNA sequence trimming, automatic rRNA detection contaminating species identification. The steps run either sequentially or independently, enabling RNA-QC-Chain comprehensive high flexibility usability. Moreover, parallel computing optimizations embedded procedures, providing superior efficiency. performance been evaluated types datasets, including in-house semi-simulated two real datasets downloaded public database. Comparisons manifested its superiorities both function versatility processing speed. present here tool, comprehensively resolve processes effectively efficiently.

参考文章(20)
Konrad H. Paszkiewicz, Audrey Farbos, Paul O'Neill, Karen Moore, Quality control on the frontier. Frontiers in Genetics. ,vol. 5, pp. 157- 157 ,(2014) , 10.3389/FGENE.2014.00157
James Hadfield, Matthew D. Eldridge, Multi-genome alignment for quality control and contamination screening of next-generation sequencing data. Frontiers in Genetics. ,vol. 5, pp. 31- 31 ,(2014) , 10.3389/FGENE.2014.00031
Qian Zhou, Xiaoquan Su, Anhui Wang, Jian Xu, Kang Ning, QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data PLoS ONE. ,vol. 8, pp. e60234- ,(2013) , 10.1371/JOURNAL.PONE.0060234
Zhong Wang, Mark Gerstein, Michael Snyder, RNA-Seq: a revolutionary tool for transcriptomics Nature Reviews Genetics. ,vol. 10, pp. 57- 63 ,(2009) , 10.1038/NRG2484
Cristian Del Fabbro, Simone Scalabrin, Michele Morgante, Federico M. Giorgi, An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis PLoS ONE. ,vol. 8, pp. e85024- 13 ,(2013) , 10.1371/JOURNAL.PONE.0085024
Christian Quast, Elmar Pruesse, Pelin Yilmaz, Jan Gerken, Timmy Schweer, Pablo Yarza, Jörg Peplies, Frank Oliver Glöckner, The SILVA ribosomal RNA gene database project: improved data processing and web-based tools Nucleic Acids Research. ,vol. 41, pp. 590- 596 ,(2012) , 10.1093/NAR/GKS1219
Wenhan Zhu, Alexandre Lomsadze, Mark Borodovsky, Ab initio Gene Identification in Metagenomic Sequences Nucleic Acids Research. ,vol. 38, ,(2010) , 10.1093/NAR/GKQ275
Christopher A. Maher, Chandan Kumar-Sinha, Xuhong Cao, Shanker Kalyana-Sundaram, Bo Han, Xiaojun Jing, Lee Sam, Terrence Barrette, Nallasivam Palanisamy, Arul M. Chinnaiyan, Transcriptome Sequencing to Detect Gene Fusions in Cancer Nature. ,vol. 458, pp. 97- 101 ,(2009) , 10.1038/NATURE07638
France Denoeud, Jean-Marc Aury, Corinne Da Silva, Benjamin Noel, Odile Rogier, Massimo Delledonne, Michele Morgante, Giorgio Valle, Patrick Wincker, Claude Scarpelli, Olivier Jaillon, François Artiguenave, Annotating genomes with massive-scale RNA sequencing. Genome Biology. ,vol. 9, pp. 1- 12 ,(2008) , 10.1186/GB-2008-9-12-R175