作者: Marina Alexandersson , Simon Cawley , Lior Pachter , None
DOI: 10.1101/GR.424203
关键词:
摘要: Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent to be coding. We describe a probabilistic framework for structure and alignment can used simultaneously find both of two syntenic genomic regions. A key feature method ability enhance predictions finding best sequences, while at same time biologically meaningful alignments preserve correspondence coding exons. Our generalized pair hidden Markov model, hybrid (1) models, which have been previously finding, (2) applications sequence alignment. built program called SLAM, aligns identifies complete exon/intron structures genes in but unannotated sequences DNA. SLAM able reliably predict any suitably organisms, most notably with fewer false-positive compared previous methods (examples provided Homo sapiens/Mus musculus andPlasmodium falciparum/Plasmodium vivax comparisons). Accuracy obtained distinguishing noncoding (CNS) from sequence. CNS annotation novel may useful UTRs, regulatory elements, other features.