A contig assembly program based on sensitive detection of fragment overlaps.

作者: Xiaoqiu Huang

DOI: 10.1016/S0888-7543(05)80277-0

关键词: WorkstationGeneticsSoftwareAlgorithmSet (abstract data type)Dynamic programmingComputer programFilter (higher-order function)Fragment (computer graphics)ContigBiology

摘要: An effective computer program for assembling DNA fragments, the contig assembly (CAP), has been developed. In CAP program, a filter is used to eliminate quickly fragment pairs that could not possibly overlap, dynamic programming algorithm applied compute maximal-scoring overlapping alignment between each remaining pair of and simple greedy approach employed assemble fragments in order scores. To identify true overlaps, uses specially chosen sets parameters tolerate sequencing errors penalize "mutational" changes different copies repetitive sequence. The performance tests on data from genomic projects produced satisfactory results. efficient time memory; it took about 4 h set 1015 into long contigs Sun workstation.

参考文章(24)
Esko Ukkonen, Jorma Tarhio, Hannu Peltola, Hans Söderlund, Algorithms for Some String Matching Problems Arising in Molecular Genetics. ifip congress. pp. 59- 64 ,(1983)
Michael S. Waterman, Robert Jones, Consensus methods for DNA and protein sequence alignment. Methods in Enzymology. ,vol. 183, pp. 221- 237 ,(1990) , 10.1016/0076-6879(90)83016-3
Eric S. Lander, Michael S. Waterman, Genomic mapping by fingerprinting random clones: A mathematical analysis Genomics. ,vol. 2, pp. 231- 239 ,(1988) , 10.1016/0888-7543(88)90007-9
Hannu Peltola, Hans Söderlund, Esko Ukkonen, SEQAID: a DNA sequence assembling program based on a mathematical model Nucleic Acids Research. ,vol. 12, pp. 307- 321 ,(1984) , 10.1093/NAR/12.1PART1.307
Richard K. Wilson, Chia Chen, Nebojsa Avdalovic, James Burns, Leroy Hood, Development of an automated procedure for fluorescent DNA sequencing Genomics. ,vol. 6, pp. 626- 634 ,(1990) , 10.1016/0888-7543(90)90497-I
Peter H Sellers, The theory and computation of evolutionary distances: Pattern recognition Journal of Algorithms. ,vol. 1, pp. 359- 373 ,(1980) , 10.1016/0196-6774(80)90016-4
Simon Dear, Rodger Staden, A sequence assembly and editing program for efficient management of large projects Nucleic Acids Research. ,vol. 19, pp. 3907- 3911 ,(1991) , 10.1093/NAR/19.14.3907
Xiaoqiu Huang, Webb Miller, A time-efficient, linear-space local similarity algorithm Advances in Applied Mathematics. ,vol. 12, pp. 337- 357 ,(1991) , 10.1016/0196-8858(91)90017-D
W. R. Pearson, D. J. Lipman, Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 85, pp. 2444- 2448 ,(1988) , 10.1073/PNAS.85.8.2444