An MCMC algorithm for detecting short adjacent repeats shared by multiple sequences

作者: Xiaodan Fan , Tong Liang , Shuo−Yen R. Li , Qiwei Li

DOI: 10.1093/BIOINFORMATICS/BTR287

关键词: Structure (mathematical logic)Biological dataString searching algorithmRepeated sequenceMathematicsAlgorithmStatistical inferenceTask (computing)Source codeSignal processing

摘要: Motivation: Repeats detection problems are traditionally formulated as string matching or signal processing problems. They cannot readily handle gaps between repeat units and incapable of detecting patterns shared by multiple sequences. This study detects short adjacent repeats with interunit insertions from For biological sequences, such studies can shed light on molecular structure, function evolution. Results: The task is a statistical inference problem using probabilistic generative model. An Markov chain Monte Carlo algorithm proposed to infer the parameters in de novo fashion. Its applications synthetic real data show that new method not only has competitive edge over existing methods, but also provide way structure evolution repeat-containing genes. Availability: related C++ source code datasets available at http://ihome.cuhk.edu.hk/%7Eb118998/share/BASARD.zip. Contact: xfan@sta.cuhk.edu.hk Supplementary information:Supplementary Bioinformatics online.

参考文章(29)
J L Weber, P E May, Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction American Journal of Human Genetics. ,vol. 44, pp. 388- 396 ,(1989)
William J. Murphy, Eduardo Eizirik, Warren E. Johnson, Ya Ping Zhang, Oliver A. Ryder, Stephen J. O'Brien, Molecular phylogenetics and the origins of placental mammals Nature. ,vol. 409, pp. 614- 618 ,(2001) , 10.1038/35054550
Mayetri Gupta, Jun S Liu, Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model Journal of the American Statistical Association. ,vol. 98, pp. 55- 66 ,(2003) , 10.1198/016214503388619094
Ravi Gupta, Divya Sarthi, Ankush Mittal, Kuldip Singh, A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences Eurasip Journal on Bioinformatics and Systems Biology. ,vol. 2007, pp. 3- 3 ,(2007) , 10.1155/2007/43596
Svend Arild Larsen, Line Mogensen, Rune Dietz, Hans Jørgen Baagøe, Mogens Andersen, Thomas Werge, Henrik Berg Rasmussen, Identification and characterization of tandem repeats in exon III of dopamine receptor D4 (DRD4) genes from different mammalian species. DNA and Cell Biology. ,vol. 24, pp. 795- 804 ,(2005) , 10.1089/DNA.2005.24.795
Marie-France Sagot, Eugene W Myers, None, Identifying satellites and periodic repetitions in biological sequences. Journal of Computational Biology. ,vol. 5, pp. 539- 553 ,(1998) , 10.1089/CMB.1998.5.539
C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, J. Wootton, Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment Science. ,vol. 262, pp. 208- 214 ,(1993) , 10.1126/SCIENCE.8211139
O Schoots, H H M Van Tol, The human dopamine D4 receptor repeat sequences modulate expression Pharmacogenomics Journal. ,vol. 3, pp. 343- 348 ,(2003) , 10.1038/SJ.TPJ.6500208
J. Hoh, S. Jin, T. Parrado, J. Edington, A. J. Levine, J. Ott, The p53MH algorithm and its application in detecting p53-responsive genes Proceedings of the National Academy of Sciences of the United States of America. ,vol. 99, pp. 8467- 8472 ,(2002) , 10.1073/PNAS.132268899