Database support for top-down proteomics

作者: Geneva Belford , Yong-Bin Kim

DOI:

关键词: DatabaseDatabase schemaIdentification (information)Data warehouseComputer scienceRelational databaseData miningSoftware suiteData integrationDatabase search enginePopulation

摘要: Top-down proteomics is a revolutionary application for the identification and characterization of protein, known to be one most complicated challenging issues in biology. In top-down proteomics, quality speed data warehouse very important, as high accuracy results are returned by database search. ProSight Warehouse fills critical role PTM, first publicly available software suite. MySQL, free relational database, was base this warehouse. Many annotated predicted protein forms have been successfully incorporated into organismspecific integrated human strains. To achieve efficiency, schema (Absolute Mass Search), annotation methods (Shotgun Extended Shotgun Annotation), population strategies (on-the-fly population, bulk-loading method), integration methodology were developed. With successful implementation Warehouse, PTM achieved its aspiration, highly accurate characterization.

参考文章(59)
David L. Tabb, Jimmy K. Eng, John R. Yates, Protein Identification by SEQUEST Springer Berlin Heidelberg. pp. 125- 142 ,(2001) , 10.1007/978-3-642-56895-4_7
Todd Millstein, Alon Levy, Marc Friedman, Navigational plans for data integration national conference on artificial intelligence. pp. 67- 73 ,(1999)
Steven Henikoff, Elizabeth A Greene, Shmuel Pietrokovski, Peer Bork, Teresa K Attwood, Leroy Hood, Gene Families: The Taxonomy of Protein Paralogs and Chimeras Science. ,vol. 278, pp. 609- 614 ,(1997) , 10.1126/SCIENCE.278.5338.609
Yu-Dong Cai, Shuo-liang Lin, Kuo-Chen Chou, Support vector machines for prediction of protein signal sequences and their cleavage sites. Peptides. ,vol. 24, pp. 159- 161 ,(2003) , 10.1016/S0196-9781(02)00289-9
Gary Felsenfeld, Mark Groudine, Controlling the double helix Nature. ,vol. 421, pp. 448- 453 ,(2003) , 10.1038/NATURE01411
S.B. DAVIDSON, C. OVERTON, P. BUNEMAN, Challenges in integrating biological data sources. Journal of Computational Biology. ,vol. 2, pp. 557- 572 ,(1995) , 10.1089/CMB.1995.2.557
David N. Perkins, Darryl J. C. Pappin, David M. Creasy, John S. Cottrell, Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. ,vol. 20, pp. 3551- 3567 ,(1999) , 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2
Bogdan Bogdanov, Richard D. Smith, Proteomics by FTICR mass spectrometry: top down and bottom up. Mass Spectrometry Reviews. ,vol. 24, pp. 168- 200 ,(2005) , 10.1002/MAS.20015
Maurizio Lenzerini, Data integration: a theoretical perspective symposium on principles of database systems. pp. 233- 246 ,(2002) , 10.1145/543613.543644
Florence Corpet, Florence Servant, Jérôme Gouzy, Daniel Kahn, ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons Nucleic Acids Research. ,vol. 28, pp. 267- 269 ,(2000) , 10.1093/NAR/28.1.267