Biodart - catalogue of biological data aftifact examples

作者: Vladimir Brusic , Judice Koh , Anitha Veeramani , Kavitha Gopalakrishnan

DOI: 10.1109/ICBPE.2006.348608

关键词:

摘要: Information in biological data repositories continues to grow exponentially due the increasing genomic and proteomic sequencing projects. As with any database, these are subjected quality issues related correctness, uniformity, completeness, redundancy, among others. Data cleaning is a prerequisite prevent interference of low accuracy mining analysis. This turn involves detection resolution artifacts (errors, discrepancies, redundancies, ambiguifes, incompleteness). Understanding causes systematically classifying them critical towards their elimination molecular sequence databases. paper highlights eight found public Examples major database records containing collected into BioDArt catalogue (http://antigen.i2r.a-star.edu.sg/BioDArt).

参考文章(18)
G. Christian Overton, Juergen Haas, Chapter 5 – Case-based reasoning driven gene annotation New Comprehensive Biochemistry. ,vol. 32, pp. 65- 86 ,(1998) , 10.1016/S0167-7306(08)60462-7
Gregory D. Schuler, Jonathan A. Epstein, Hitomi Ohkawa, Jonathan A. Kans, ENTREZ : MOLECULAR BIOLOGY DATABASE AND RETRIEVAL SYSTEM Methods in Enzymology. ,vol. 266, pp. 141- 162 ,(1996) , 10.1016/S0076-6879(96)66012-1
M Dean, R Allikmets, Contamination of cDNA libraries and expressed sequence-tags databases American Journal of Human Genetics. ,vol. 57, pp. 1254- 1255 ,(1995)
G. Seluja, A Farmer, M McLeod, C Harger, P. Schad, Establishing a method of vector contamination identification in database sequences. Bioinformatics. ,vol. 15, pp. 106- 110 ,(1999) , 10.1093/BIOINFORMATICS/15.2.106
Matthew Binns, Contamination of DNA database sequence entries with Escherichia coli insertion sequences Nucleic Acids Research. ,vol. 21, pp. 779- 779 ,(1993) , 10.1093/NAR/21.3.779
Satoru Miyazaki, Hideaki Sugawara, Takashi Gojobori, Yoshio Tateno, DNA Data Bank of Japan (DDBJ) in XML Nucleic Acids Research. ,vol. 31, pp. 13- 16 ,(2003) , 10.1093/NAR/GKG088
Owen White, Ted Dunning, Granger Sutton, Mark Adams, J. Craig Venter, Chris Fields, A quality control algorithm for DNA sequencing projects Nucleic Acids Research. ,vol. 21, pp. 3829- 3838 ,(1993) , 10.1093/NAR/21.16.3829
Katherine G. Herbert, Narain H. Gehani, William H. Piel, Jason T. L. Wang, Cathy H. Wu, BIO-AJAX ACM SIGMOD Record. ,vol. 33, pp. 51- 57 ,(2004) , 10.1145/1024694.1024703
Edward D. Lamperti, J.Matthew Kittelberger, Temple F. Smith, Lydia VillaKomaroff, Corruption of genomic databases with anomalous sequence Nucleic Acids Research. ,vol. 20, pp. 2741- 2747 ,(1992) , 10.1093/NAR/20.11.2741