Statistical methods for detecting periodic fragments in DNA sequence data

作者: Julien Epps , Hua Ying , Gavin A Huttley

DOI: 10.1186/1745-6150-6-21

关键词:

摘要: Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification periodic signals in sequences is therefore required understand nucleosome organisation genomes. While various techniques for identifying components genomic have been proposed or adopted, requirements such not considered detail confirmatory testing a priori specified periods has developed. We compared estimation accuracy suitability autocorrelation, discrete Fourier transform (DFT), integer period (IPDFT) previously Hybrid measure. A number different statistical significance procedures were evaluated but blockwise bootstrap proved superior. When applied synthetic data whose period-10 signal had eroded, which was approximately period-10, technique exhibited superior properties during exploratory estimation. In contrast, using procedure identified IPDFT as having greatest power. These on yeast defined from ChIP-chip study where metric confirmed expected dominance associated more significant occurrences period-10. Application whole genomes mouse ~ 21% 19% respectively these spanned by positioning (NPS). For estimating dominant period, we find method empirically be most effective both eroded approximate periodicity. The found measure, performing particularly well problem detection presence autocorrelation poorly suited use with bootstrap. our methods two model organisms revealed striking proportion NPS. Despite their markedly sizes, roughly equivalent proportions (19-21%) lie within spans NPS {AA, TT, TA}. biological regions remains demonstrated. To facilitate this, coordinates available Additional files 1, 2, 3 format suitable visualisation tracks popular genome browsers. This article reviewed Prof Tomas Radivoyevitch, Dr Vsevolod Makeev (nominated Mikhail Gelfand), Rob D Knight.

参考文章(45)
Hanspeter Herzel, Matthias E. Futschik, Lokesh Kumar, DNA motifs and sequence periodicities. in Silico Biology. ,vol. 6, pp. 71- 78 ,(2006)
Miika Ahdesmäki, Harri Lähdesmäki, Ron Pearson, Heikki Huttunen, Olli Yli-Harja, Robust detection of periodic time series measured from biological systems. BMC Bioinformatics. ,vol. 6, pp. 117- 117 ,(2005) , 10.1186/1471-2105-6-117
B.D. Silverman, R. Linsker, A measure of DNA periodicity. Journal of Theoretical Biology. ,vol. 118, pp. 295- 300 ,(1986) , 10.1016/S0022-5193(86)80060-1
Wentian Li, The study of correlation structures of DNA sequences: a critical review. Computational Biology and Chemistry. ,vol. 21, pp. 257- 271 ,(1997) , 10.1016/S0097-8485(97)00022-3
Hanspeter Herzel, Edward N Trifonov, Olaf Weiss, I Grosse, Interpreting correlations in biosequences Physica A-statistical Mechanics and Its Applications. ,vol. 249, pp. 449- 459 ,(1998) , 10.1016/S0378-4371(97)00505-0
Sandra C. Satchwell, Horace R. Drew, Andrew A. Travers, Sequence periodicities in chicken nucleosome core DNA. Journal of Molecular Biology. ,vol. 191, pp. 659- 675 ,(1986) , 10.1016/0022-2836(86)90452-3
D. Rife, R. Boorstyn, Single tone parameter estimation from discrete-time observations IEEE Transactions on Information Theory. ,vol. 20, pp. 591- 598 ,(1974) , 10.1109/TIT.1974.1055282
E. N. Trifonov, J. L. Sussman, The pitch of chromatin DNA is reflected in its nucleotide sequence. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 77, pp. 3816- 3820 ,(1980) , 10.1073/PNAS.77.7.3816
Richard F. Voss, Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Physical Review Letters. ,vol. 68, pp. 3805- 3808 ,(1992) , 10.1103/PHYSREVLETT.68.3805