作者: Kexue Li , Yakang Lu , Li Deng , Lili Wang , Lizhen Shi
DOI: 10.7717/PEERJ.8966
关键词:
摘要: Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering reads by species before offers unique opportunity for parallel downstream of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or positive (over-clustering) problems. Here we extended our previous software, SpaRC, exploiting statistics derived multiple samples in dataset reduce the under-clustering problem. Using synthetic real-world datasets demonstrated that this method has potential cluster almost all sufficient coverage. The improved turn leads genome quality.