Biological Sequence Data Mining

关键词:

摘要: Biologists have determined that the control and regulation of gene expression is primarily by relatively short sequences in region surrounding a gene. These vary length, position, redundancy, orientation, bases. Finding these fundamental problem molecular biology with important applications. Though there exist many different approaches to signal/motif (i.e. sequence) finding, 2000 Pevzner Sze reported most current motif finding algorithms are incapable detecting target signals their so-called Challenge Problem. In this paper, we show using an iterative-restart design, our new algorithm can correctly find targets. Furthermore, taking into account fact some transcription factors form dimer or even more complex structures, process sometimes involve multiple factors, extend original challenging one. We address issue combinatorial gaps variable lengths. To demonstrate efficacy algorithm, tested it on series challenge problems, compared representative motif-finding algorithms. addition, verify its feasibility real-world applications, also several regulatory families yeast genes known motifs. The purpose paper two-fold. One introduce improved biological data mining capable dealing DNA sequences. other propose research direction for general KDD community.

springer.com 本地加速

uni-trier.de 本地加速

springer.com 本地加速

sci-hub.st HTML 下载加速

参考文章(17)

Pavel A. Pevzner, Sing-Hoi Sze, Combinatorial Approaches to Finding Subtle Signals in DNA Sequences intelligent systems in molecular biology. ,vol. 8, pp. 269- 278 ,(2000)

Yuh-Jyh Hu, Dennis F. Kibler, Suzanne B. Sandmeyer, Detecting Motifs from Sequences international conference on machine learning. pp. 181- 190 ,(1999)

Martin Tompa, Saurabh Sinha, A Statistical Method for Finding Transcription Factor Binding Sites intelligent systems in molecular biology. ,vol. 8, pp. 344- 354 ,(2000)

C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, J. Wootton, Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment Science. ,vol. 262, pp. 208- 214 ,(1993) , 10.1126/SCIENCE.8211139

Emily Rocke, Martin Tompa, An algorithm for finding novel gapped motifs in DNA sequences research in computational molecular biology. pp. 228- 233 ,(1998) , 10.1145/279069.279119

J. van Helden, B. André, J. Collado-Vides, Extracting Regulatory Sites from the Upstream Region of Yeast Genes by Computational Analysis of Oligonucleotide Frequencies Journal of Molecular Biology. ,vol. 281, pp. 827- 842 ,(1998) , 10.1006/JMBI.1998.1947

Charles E. Lawrence, Andrew A. Reilly, An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins. ,vol. 7, pp. 41- 51 ,(1990) , 10.1002/PROT.340070105

Timothy L. Bailey, Charles Elkan, Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization Machine Learning. ,vol. 21, pp. 51- 80 ,(1995) , 10.1007/BF00993379

Lisa Wodicka, Helin Dong, Michael Mittmann, Ming-Hsiu Ho, David J. Lockhart, Genome-wide expression monitoring in Saccharomyces cerevisiae Nature Biotechnology. ,vol. 15, pp. 1359- 1367 ,(1997) , 10.1038/NBT1297-1359

10.

Ming Li, Bin Ma, Lusheng Wang, Finding similar regions in many strings Proceedings of the thirty-first annual ACM symposium on Theory of computing - STOC '99. pp. 473- 482 ,(1999) , 10.1145/301250.301376

Biological Sequence Data Mining

来源期刊

我的账户

Biological Sequence Data Mining

来源期刊

相似文章 3

Computer system "gene discovery" for promoter structure analysis.

An algorithm for mining frequent patterns in biological sequence

WITHDRAWN: Biological Sequence Pattern Mining Algorithm Based on Data Index Technology

我的账户