作者: John D. Kececioglu , Dean Matthew Starrett
DOI:
关键词: Computational problem 、 Sequence 、 Computer science 、 Heuristic 、 Computational complexity theory 、 Combinatorial optimization 、 Alignment-free sequence analysis 、 Multiple sequence alignment 、 Theoretical computer science 、 Affine transformation
摘要: An essential tool in biology is the alignment of multiple sequences. Biologists use sequence alignments for tasks such as predicting protein structure and function, reconstructing phylogenetic trees, finding motifs. Constructing high-quality computationally hard, both theory practice, typically done using heuristic methods. The majority state-of-the-art programs employ a form polish strategy, where construction phase, an initial formed by progressively merging smaller alignments, starting with single Then local-search resulting polished repeatedly splitting it into re-merging. This basic computational problem phases best heuristics, called Aligning Alignments Problem. Under sum-of-pairs objective scoring this may seem to be simple extension two-sequence alignment. It proven here, however, that affine gap costs (which are recognized necessary get biologically-informative alignments) NP-complete when gaps counted exactly. Interestingly, polynomial-time solvable we relax exact count, showing counts themselves inherently hard Unlike general show tractable demonstrating effective algorithm fast implementation. Our software AlignAlign time- space-efficient on biological data. Computational experiments data instances derived from standard benchmark suites can optimally aligned surprising efficiency, simulated time space scale well.