DNA sequencing and string learning

作者: Tao Jiang , Ming Li

DOI: 10.1007/BF01192694

关键词:

摘要: In laboratories the majority of large-scale DNA sequencing is done following theshotgun strategy, which to sequence large amount relatively short fragments randomly and then heuristically find a shortest common superstring [26]. We study mathematical frameworks, under plausible assumptions, suitable for massive automated analyzing algorithms. model problem as learning string from its drawn substrings. Under certain restrictions, this may be viewed in Valiant's distribution-free case we give an efficient algorithm quantitative bound on how many examples suffice. One major obstacle our approach turns out quite well-known open question approximate set strings, raised by number authors last 10 years [9], [29], [30]. firstprovably good approximates lengthn lengthO(n logn). The works equally well even presence negative examples, i.e., when merging some strings prohibited.

参考文章(28)
Esko Ukkonen, Jorma Tarhio, Hannu Peltola, Hans Söderlund, Algorithms for Some String Matching Problems Arising in Molecular Genetics. ifip congress. pp. 59- 64 ,(1983)
James A. Storer, Data compression: methods and theory Computer Science Press, Inc.. ,(1987)
Michael Kearns, Ming Li, Learning in the Presence of Malicious Errors SIAM Journal on Computing. ,vol. 22, pp. 807- 837 ,(1993) , 10.1137/0222052
Manfred K. Warmuth, David Helmbold, Robert Sloan, Learning nested differences of intersection-closed concept classes conference on learning theory. pp. 41- 56 ,(1989) , 10.5555/93335.93341
Tao Jiang, Ming Li, On the complexity of learning strings and sequences Theoretical Computer Science. ,vol. 119, pp. 363- 371 ,(1993) , 10.1016/0304-3975(93)90167-R
David Haussler, Quantifying inductive bias: AI learning algorithms and Valiant's learning framework Artificial Intelligence. ,vol. 36, pp. 177- 221 ,(1988) , 10.1016/0004-3702(88)90002-1
Peter Friedland, Laurence H. Kedes, Discovering the secrets of DNA Communications of The ACM. ,vol. 28, pp. 1164- 1186 ,(1985) , 10.1145/4547.4550