作者: Qian Zhou , Xiaoquan Su , Gongchao Jing , Songlin Chen , Kang Ning
DOI: 10.1186/S12864-018-4503-6
关键词: Data mining 、 Usability 、 Biology 、 Deep sequencing 、 Pipeline (software) 、 Flexibility (engineering) 、 Interpretability 、 Trimming 、 Raw data 、 Suite
摘要: RNA-Seq has become one of the most widely used applications based on next-generation sequencing technology. However, raw data may have quality issues, which can significantly distort analytical results and lead to erroneous conclusions. Therefore, must be subjected vigorous control (QC) procedures before downstream analysis. Currently, an accurate complete QC requires a suite different tools consecutively, is inefficient in terms usability, running time, file usage, interpretability results. We developed comprehensive, fast easy-to-use pipeline for data, RNA-QC-Chain, involves three steps: (1) sequencing-quality assessment trimming; (2) internal (ribosomal RNAs) external (reads from foreign species) contamination filtering; (3) alignment statistics reporting (such as read number, coverage, depth pair-end mapping information). This package was our previously reported tool general (NGS) called QC-Chain, with extensions specifically designed data. It several features that are not available yet other such RNA sequence trimming, automatic rRNA detection contaminating species identification. The steps run either sequentially or independently, enabling RNA-QC-Chain comprehensive high flexibility usability. Moreover, parallel computing optimizations embedded procedures, providing superior efficiency. performance been evaluated types datasets, including in-house semi-simulated two real datasets downloaded public database. Comparisons manifested its superiorities both function versatility processing speed. present here tool, comprehensively resolve processes effectively efficiently.