Optimal alignment of multiple sequence alignments

作者: John D. Kececioglu , Dean Matthew Starrett

DOI:

关键词: Computational problemSequenceComputer scienceHeuristicComputational complexity theoryCombinatorial optimizationAlignment-free sequence analysisMultiple sequence alignmentTheoretical computer scienceAffine transformation

摘要: An essential tool in biology is the alignment of multiple sequences. Biologists use sequence alignments for tasks such as predicting protein structure and function, reconstructing phylogenetic trees, finding motifs. Constructing high-quality computationally hard, both theory practice, typically done using heuristic methods. The majority state-of-the-art programs employ a form polish strategy, where construction phase, an initial formed by progressively merging smaller alignments, starting with single Then local-search resulting polished repeatedly splitting it into re-merging. This basic computational problem phases best heuristics, called Aligning Alignments Problem. Under sum-of-pairs objective scoring this may seem to be simple extension two-sequence alignment. It proven here, however, that affine gap costs (which are recognized necessary get biologically-informative alignments) NP-complete when gaps counted exactly. Interestingly, polynomial-time solvable we relax exact count, showing counts themselves inherently hard Unlike general show tractable demonstrating effective algorithm fast implementation. Our software AlignAlign time- space-efficient on biological data. Computational experiments data instances derived from standard benchmark suites can optimally aligned surprising efficiency, simulated time space scale well.

参考文章(70)
Lusheng Wang, Dan Gusfield, Improved Approximation Algorithms for Tree Alignment combinatorial pattern matching. pp. 220- 233 ,(1996) , 10.1007/3-540-61258-0_17
Chuong B. Do, Samuel S. Gross, Serafim Batzoglou, CONTRAlign: Discriminative Training for Protein Sequence Alignment Lecture Notes in Computer Science. pp. 160- 174 ,(2006) , 10.1007/11732990_15
Eagu Kim, John Kececioglu, Inverse sequence alignment from partial examples workshop on algorithms in bioinformatics. pp. 359- 370 ,(2007) , 10.1007/978-3-540-74126-8_33
Bin Ma, Zhuozhi Wang, Kaizhong Zhang, None, Alignment between two multiple alignments combinatorial pattern matching. pp. 254- 265 ,(2003) , 10.1007/3-540-44888-8_19
Jean-Michel Richer, Vincent Derrien, Jin-Kao Hao, A new dynamic programming algorithm for multiple sequence alignment conference on combinatorial optimization and applications. pp. 52- 61 ,(2007) , 10.1007/978-3-540-73556-4_8
John Kececioglu, The Maximum Weight Trace Problem in Multiple Sequence Alignment combinatorial pattern matching. pp. 106- 119 ,(1993) , 10.1007/BFB0029800
D. Gusfield, P. Stelling, [28] Parametric and inverse-parametric sequence alignment with XPARAL Methods in Enzymology. ,vol. 266, pp. 481- 494 ,(1996) , 10.1016/S0076-6879(96)66030-3
V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals Soviet physics. Doklady. ,vol. 10, pp. 707- 710 ,(1966)