作者: Vladimir Brusic , Judice Koh , Anitha Veeramani , Kavitha Gopalakrishnan
DOI: 10.1109/ICBPE.2006.348608
关键词:
摘要: Information in biological data repositories continues to grow exponentially due the increasing genomic and proteomic sequencing projects. As with any database, these are subjected quality issues related correctness, uniformity, completeness, redundancy, among others. Data cleaning is a prerequisite prevent interference of low accuracy mining analysis. This turn involves detection resolution artifacts (errors, discrepancies, redundancies, ambiguifes, incompleteness). Understanding causes systematically classifying them critical towards their elimination molecular sequence databases. paper highlights eight found public Examples major database records containing collected into BioDArt catalogue (http://antigen.i2r.a-star.edu.sg/BioDArt).