Biodart - catalogue of biological data aftifact examples

作者： Vladimir Brusic , Judice Koh , Anitha Veeramani , Kavitha Gopalakrishnan

DOI: 10.1109/ICBPE.2006.348608

关键词:

摘要: Information in biological data repositories continues to grow exponentially due the increasing genomic and proteomic sequencing projects. As with any database, these are subjected quality issues related correctness, uniformity, completeness, redundancy, among others. Data cleaning is a prerequisite prevent interference of low accuracy mining analysis. This turn involves detection resolution artifacts (errors, discrepancies, redundancies, ambiguifes, incompleteness). Understanding causes systematically classifying them critical towards their elimination molecular sequence databases. paper highlights eight found public Examples major database records containing collected into BioDArt catalogue (http://antigen.i2r.a-star.edu.sg/BioDArt).

uq.edu.au 本地加速

sci-hub.st HTML 下载加速

参考文章(18)

G. Christian Overton, Juergen Haas, Chapter 5 – Case-based reasoning driven gene annotation New Comprehensive Biochemistry. ,vol. 32, pp. 65- 86 ,(1998) , 10.1016/S0167-7306(08)60462-7

Gregory D. Schuler, Jonathan A. Epstein, Hitomi Ohkawa, Jonathan A. Kans, ENTREZ : MOLECULAR BIOLOGY DATABASE AND RETRIEVAL SYSTEM Methods in Enzymology. ,vol. 266, pp. 141- 162 ,(1996) , 10.1016/S0076-6879(96)66012-1

M Dean, R Allikmets, Contamination of cDNA libraries and expressed sequence-tags databases American Journal of Human Genetics. ,vol. 57, pp. 1254- 1255 ,(1995)

G. Seluja, A Farmer, M McLeod, C Harger, P. Schad, Establishing a method of vector contamination identification in database sequences. Bioinformatics. ,vol. 15, pp. 106- 110 ,(1999) , 10.1093/BIOINFORMATICS/15.2.106

Matthew Binns, Contamination of DNA database sequence entries with Escherichia coli insertion sequences Nucleic Acids Research. ,vol. 21, pp. 779- 779 ,(1993) , 10.1093/NAR/21.3.779

Satoru Miyazaki, Hideaki Sugawara, Takashi Gojobori, Yoshio Tateno, DNA Data Bank of Japan (DDBJ) in XML Nucleic Acids Research. ,vol. 31, pp. 13- 16 ,(2003) , 10.1093/NAR/GKG088

C Miller, J Gurd, A Brass, A RAPID algorithm for sequence database comparisons: application to the identification of vector contamination in the EMBL databases. Bioinformatics. ,vol. 15, pp. 111- 121 ,(1999) , 10.1093/BIOINFORMATICS/15.2.111

Owen White, Ted Dunning, Granger Sutton, Mark Adams, J. Craig Venter, Chris Fields, A quality control algorithm for DNA sequencing projects Nucleic Acids Research. ,vol. 21, pp. 3829- 3838 ,(1993) , 10.1093/NAR/21.16.3829

Katherine G. Herbert, Narain H. Gehani, William H. Piel, Jason T. L. Wang, Cathy H. Wu, BIO-AJAX ACM SIGMOD Record. ,vol. 33, pp. 51- 57 ,(2004) , 10.1145/1024694.1024703

10.

Edward D. Lamperti, J.Matthew Kittelberger, Temple F. Smith, Lydia VillaKomaroff, Corruption of genomic databases with anomalous sequence Nucleic Acids Research. ,vol. 20, pp. 2741- 2747 ,(1992) , 10.1093/NAR/20.11.2741

Biodart - catalogue of biological data aftifact examples

来源期刊

我的账户

Biodart - catalogue of biological data aftifact examples

来源期刊

相似文章 1

STATdb: A Specialised Resource for the STATome

我的账户