作者: Pjotr Prins , Joep de Ligt , Artem Tarasov , Ritsert C Jansen , Edwin Cuppen
DOI: 10.1038/NBT.3240
关键词:
摘要: Leading scientists tell us that the problem of large data and integration, referred to as 'big data', is acute hurting research. Recently, Snijder et al.1 suggested a culture change in which would aim share high-dimensional among laboratories. It important realize sharing only part solution. The elephant room bioinformatics software development particular—which, despite being crucially important, mostly fails address requirements data'. Whereas Internet companies such Google, Facebook Skype have built infrastructure developed innovative solutions cope with vast amounts data, bioscience community seems be struggling big projects. This has led problems sharing, annotation, computation reproducibility data2, 3, 4. Before we can devise for there are more basic pressing concerns need resolved. Biologists not formally trained engineering, so much available today been by PhD biologists relative isolation on back funded experimental research programs. model tied wet-lab work well but resulted 'one-offs'. most projects obtain results shortest possible time, this often achieved writing prototype rather than developing well-engineered scalable solutions. Even when funding obtained develop software, usually no long-term resources allocated maintenance, bug fixing, continuity reproducibility. Instead working alone researchers join or start collaborative free open-source (FOSS) projects, thereby improving their coding skills through scrutiny peers. True FOSS licenses allow continuation were abandoned original developers, enabling modular development. We published manifesto practical guide FOSS-style (https://github.com/pjotrp/bioinformatics/blob/master/README.md) aims provide process architecture guidelines early-career bioinformaticians supervisors. Bioinformatics already vibrant Galaxy, Cytoscape, BioPerl Biopython, these worked part-time owing lack inadequate will service biology without major additional investment. For example, after initial from US National Institutes Health (NIH) Science Foundation (NSF), Galaxy project now seeking new continue its work, funds at all granted scientific agencies Biopython. amount dedicated remains small. NIH budget $30 billion, an estimated 2–4% grants. estimate small fraction used By comparison, nonprofit Mozilla turns over $300 million annually promotion, Google invests $6.7 billion RD emphasize approaches; build existing grassroots initiatives5; create split streams hardware; support maintenance projects; encourage collaboration experts high-performance computing engineering; fund larger