Analyzing large-scale proteomics projects with latent semantic indexing.

作者: Sebastian Klie , Lennart Martens , Juan Antonio Vizcaíno , Richard Côté , Phil Jones

DOI: 10.1021/PR070461K

关键词:

摘要: Since the advent of public data repositories for proteomics data, readily accessible results from high-throughput experiments have been accumulating steadily. Several large-scale projects in particular contributed substantially to amount identifications available community. Despite considerable body information amassed, very few successful analyses performed and published on this leveling off ultimate value these far below their potential. A prominent reason is seldom reanalyzed lies heterogeneous nature original sample collection subsequent recording processing. To illustrate that at least part heterogeneity can be compensated for, we here apply a latent semantic analysis by Human Proteome Organization's Plasma Project (HUPO PPP). Interestingly, despite broad spectrum instruments methodologies applied HUPO PPP, our reveals several obvious patterns used formulate concrete recommendations optimizing project planning as well choice technologies future experiments. It clear large bodies publicly noise-tolerant algorithms such holds great promise currently underexploited.

参考文章(26)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Gilbert S Omenn, David J States, Marcin Adamski, Thomas W Blackwell, Rajasree Menon, Henning Hermjakob, Rolf Apweiler, Brian B Haab, Richard J Simpson, James S Eddes, Eugene A Kapp, Robert L Moritz, Daniel W Chan, Alex J Rai, Arie Admon, Ruedi Aebersold, Jimmy Eng, William S Hancock, Stanley A Hefta, Helmut Meyer, Young‐Ki Paik, Jong‐Shin Yoo, Peipei Ping, Joel Pounds, Joshua Adkins, Xiaohong Qian, Rong Wang, Valerie Wasinger, Chi Yue Wu, Xiaohang Zhao, Rong Zeng, Alexander Archakov, Akira Tsugita, Ilan Beer, Akhilesh Pandey, Michael Pisano, Philip Andrews, Harald Tammen, David W Speicher, Samir M Hanash, None, Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database. Proteomics. ,vol. 5, pp. 3226- 3245 ,(2005) , 10.1002/PMIC.200500358
John T Prince, Mark W Carlson, Rong Wang, Peng Lu, Edward M Marcotte, The need for a public proteomics repository. Nature Biotechnology. ,vol. 22, pp. 471- 472 ,(2004) , 10.1038/NBT0404-471
Parag Mallick, Markus Schirle, Sharon S Chen, Mark R Flory, Hookeun Lee, Daniel Martin, Jeffrey Ranish, Brian Raught, Robert Schmitt, Thilo Werner, Bernhard Kuster, Ruedi Aebersold, Computational prediction of proteotypic peptides for quantitative proteomics Nature Biotechnology. ,vol. 25, pp. 125- 131 ,(2007) , 10.1038/NBT1275
Joshua N. Adkins, Matthew E. Monroe, Kenneth J. Auberry, Yufeng Shen, Jon M. Jacobs, David G. Camp, Frank Vitzthum, Karin D. Rodland, Richard C. Zangar, Richard D. Smith, Joel G. Pounds, A proteomic study of the HUPO plasma Proteome Project's pilot samples using an accurate mass and time tag strategy Proteomics. ,vol. 5, pp. 3454- 3466 ,(2005) , 10.1002/PMIC.200401333
Ruedi Aebersold, Matthias Mann, Mass spectrometry-based proteomics Nature. ,vol. 422, pp. 198- 207 ,(2003) , 10.1038/NATURE01511
Nina Zolotarjova, James Martosella, Gordon Nicol, Jerome Bailey, Barry E. Boyes, William C. Barrett, Differences among techniques for high-abundant protein depletion. Proteomics. ,vol. 5, pp. 3304- 3313 ,(2005) , 10.1002/PMIC.200402021
Andrew Smellie, Accelerated K-Means Clustering in Metric Spaces Journal of Chemical Information and Computer Sciences. ,vol. 44, pp. 1929- 1935 ,(2004) , 10.1021/CI0499222
Robertson Craig, John P. Cortens, Ronald C. Beavis, Open source system for analyzing, validating, and storing protein identification data. Journal of Proteome Research. ,vol. 3, pp. 1234- 1242 ,(2004) , 10.1021/PR049882H
Bruno Domon, Ruedi Aebersold, Mass Spectrometry and Protein Analysis Science. ,vol. 312, pp. 212- 217 ,(2006) , 10.1126/SCIENCE.1124619