A quality control algorithm for DNA sequencing projects

作者: Owen White , Ted Dunning , Granger Sutton , Mark Adams , J. Craig Venter

DOI: 10.1093/NAR/21.16.3829

关键词: HeterologousGenomeDNASequence analysisExpressed sequence tagSequence alignmentGeneticsGenomic libraryDNA sequencingBiology

摘要: Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments hybrid or impure tissue sources can threaten purity libraries that are derived RNA DNA. Hybridization methods only detect contaminants known suspected heterologous sources, and whole library screening is technically very difficult. Detection contaminating clones by sequence alignment possible when related present in a database. We have developed statistical test to identify based on differences hexamer composition different organisms. This does not require similar potential database, principle contamination previously unknown applied this major public expressed tag (EST) data sets evaluate its utility as quality control measure peer evaluation tool. There detectable heterogeneity most human C.elegans EST but it apparently associated cross-species contamination. However, there direct evidence for both yeast bacterial some database annotated human. Results obtained been confirmed similarity searches using relevant sets.

参考文章(20)
R. Christen, A. Ratto, A. Baroin, R. Perasso, K. G. Grell, A. Adoutte, An analysis of the origin of metazoans, using comparisons of partial sequences of the 28S RNA, reveals an early emergence of triploblasts. The EMBO Journal. ,vol. 10, pp. 499- 503 ,(1991) , 10.1002/J.1460-2075.1991.TB07975.X
Jean-Michel Claverie, Isabelle Sauvaget, Lydie Bougueleret, K-tuple frequency analysis: from intron/exon discrimination to T-cell epitope mapping. Methods in Enzymology. ,vol. 183, pp. 237- 252 ,(1990) , 10.1016/0076-6879(90)83017-4
Steve G Oliver, Quirina JM van der Aart, Maria L Agostoni-Carbone, Michel Aigle, Lilia Alberghina, Despina Alexandraki, G Antoine, R Anwar, JPG Ballesta, P Benit, G Berben, Elisabetta Bergantino, N Biteau, PA Bolle, M Bolotin-Fukuhara, A Brown, AJP Brown, JM Buhler, C Carcano, G Carignani, H Cederberg, R Chanet, R Contreras, M Crouzet, B Daignan-Fornier, E Defoor, M Delgado, J Demolder, C Doira, E Dubois, B Dujon, A Dusterhoft, D Erdmann, M Esteban, F Fabre, C Fairhead, G Faye, H Feldmann, W Fiers, MC Francingues-Gaillard, L Franco, L Frontali, H Fukuhara, LJ Fuller, P Galland, ME Gent, D Gigot, V Gilliquet, N Glansdorff, A Goffeau, M Grenson, P Grisanti, LA Grivell, M De Haan, M Haasemann, D Hatat, J Hoenicka, J Hegemann, CJ Herbert, F Hilger, S Hohmann, CP Hollenberg, K Huse, F Iborra, KJ Indje, K Isono, C Jacq, M Jacquet, CM James, JC Jauniaux, Y Jia, A Jimenez, A Kelly, U Kleinhans, P Kreisl, Gerolamo Lanfranchi, C Lewis, CG Vanderlinden, G Lucchini, K Lutzenkirchen, MJ Maat, L Mallet, G Mannhaupet, E Martegani, A Mathieu, CTC Maurer, D McConnell, RA McKee, F Messenguy, HW Mewes, F Molemans, MA Montague, M Muzi Falconi, L Navas, CS Newlon, D Noone, C Pallier, L Panzeri, BM Pearson, J Perea, P Philippsen, A Pierard, RJ Planta, P Plevani, B Poetsch, F Pohl, B Purnelle, M Ramezani Rad, SW Rasmussen, A Raynal, M Remacha, P Richterich, AB Roberts, F Rodriguez, E Sanz, I Schaaff-Gerstenschlager, B Scherens, B Schweitzer, Y Shu, J Skala, PP Slonimski, F Sor, C Soustelle, R Spiegelberg, LI Stateva, HY Steensma, S Steiner, A Thierry, G Thireos, M Tzermia, LA Urrestarazu, Giorgio Valle, I Vetter, JC van Vliet-Reedijk, M Voet, G Volckaert, P Vreken, H Wang, JR Warmington, D Von Wettstein, BL Wicksteed, C Wilson, H Wurst, G Xu, A Yoshikawa, FK Zimmermann, JG Sgouros, None, The complete DNA sequence of yeast chromosome III. Nature. ,vol. 357, pp. 38- 46 ,(1992) , 10.1038/357038A0
W Richard McCombie, Mark D Adams, Jenny M Kelley, Michael G FitzGerald, Teresa R Utterback, Mohammad Khan, Mark Dubnick, Anthony R Kerlavage, J Craig Venter, Chris Fields, None, Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues. Nature Genetics. ,vol. 1, pp. 124- 131 ,(1992) , 10.1038/NG0592-124
C Savakis, R Doelz, Contamination of cDNA sequences in databases Science. ,vol. 259, pp. 1677- 1678 ,(1993) , 10.1126/SCIENCE.8456288
Thomas R. Bürglin, Thomas M. Barnes, Introns in sequence tags. Nature. ,vol. 357, pp. 367- 367 ,(1992) , 10.1038/357367A0
C. Burge, A. M. Campbell, S. Karlin, Over- and under-representation of short oligonucleotides in DNA sequences. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 89, pp. 1358- 1362 ,(1992) , 10.1073/PNAS.89.4.1358
R. Waterston, C. Martin, M. Craxton, C. Huynh, A. Coulson, L. Hillier, R. Durbin, P. Green, R. Shownkeen, N. Halloran, M. Metzstein, T. Hawkins, R. Wilson, M. Berks, Z. Du, K. Thomas, J. Thierry-Mieg, J. Sulston, A survey of expressed genes in Caenorhabditis elegans. Nature Genetics. ,vol. 1, pp. 114- 123 ,(1992) , 10.1038/NG0592-114
J. Sulston, Z. Du, K. Thomas, R. Wilson, L. Hillier, R. Staden, N. Halloran, P. Green, J. Thierry-Mieg, L. Qiu, S. Dear, A. Coulson, M. Craxton, R. Durbin, M. Berks, M. Metzstein, T. Hawkins, R. Ainscough, R. Waterston, The C. elegans genome sequencing project: a beginning Nature. ,vol. 356, pp. 37- 41 ,(1992) , 10.1038/356037A0
Kousaku Okubo, Naohiro Hori, Ryo Matoba, Toshiyuki Niiyama, Atsushi Fukushima, Yuko Kojima, Kenichi Matsubara, Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nature Genetics. ,vol. 2, pp. 173- 179 ,(1992) , 10.1038/NG1192-173