Parallelization of local BLAST service on workstation clusters

作者: K.T Pedretti , T.L Casavant , T.E Scheetz , C.L Birkett , C.A Roberts

DOI: 10.1016/S0167-739X(00)00057-1

关键词: DNA sequencingDistributed computingServerSet (abstract data type)Human genomeFunctional genomicsService (systems architecture)Mode (computer interface)SequenceParallel computingGeneDistributed databaseGenome projectComputer science

摘要: Abstract This paper describes approaches to improve the performance of one most common and increasingly important aspects Human Genome Project (HGP) — large-volume, batch comparison DNA sequence data. basic operation, usually carried out by well-known BLAST program on subject against internationally available databases nearly five million target sequences, is already used hundreds thousands times each day researchers around world. At present, it still primarily in single query, or small query mode. As entire human genome nears completion, area functional genomics, use micro-arrays sets genes, coming fore. These developments will demand ever more efficient means BLASTing data that make processor implementation powerful workstations infeasible. We describe three primary parallel components BLAST. The first at sequence-to-sequence level. second parallelizes a across partitioned distributed database. Finally, set queries themselves are servers with replicated databases. methods may be employed alone concert. Our current described which requests, our plans for other levels also described. results ultimately applied hardware assistance this soon-to-be primitive computer operation.

参考文章(6)
Robert L. Henderson, Job Scheduling Under the Portable Batch System job scheduling strategies for parallel processing. pp. 279- 294 ,(1995) , 10.1007/3-540-60153-8_34
J. Sulston, Z. Du, K. Thomas, R. Wilson, L. Hillier, R. Staden, N. Halloran, P. Green, J. Thierry-Mieg, L. Qiu, S. Dear, A. Coulson, M. Craxton, R. Durbin, M. Berks, M. Metzstein, T. Hawkins, R. Ainscough, R. Waterston, The C. elegans genome sequencing project: a beginning Nature. ,vol. 356, pp. 37- 41 ,(1992) , 10.1038/356037A0
J. A. Blake, J. E. Richardson, M. T. Davisson, J. T. Eppig, , The Mouse Genome Database (MGD). A comprehensive public resource of genetic, phenotypic and genomic data. The Mouse Genome Informatics Group. Nucleic Acids Research. ,vol. 25, pp. 85- 91 ,(1997) , 10.1093/NAR/25.1.85
Zheng Zhang, Webb Miller, David J Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. ,vol. 25, pp. 3389- 3402 ,(1997) , 10.1093/NAR/25.17.3389