作者: Federico M. Lauro , Srikumar Venugopal , Freddie Sunarso
DOI:
关键词:
摘要: Metagenomics is the study of environments through genetic sampling their microbiota. Metagenomic studies produce large datasets that are estimated to grow at a faster rate than available computational capacity. A key step in metagenome data sequence similarity searching which computationally intensive over datasets. Tools such as BLAST require dedicated computing infrastructure perform analysis and may not be every researcher. In this paper, we propose novel approach called ScalLoPS performs on protein using LSH (Locality-Sensitive Hashing) implemented MapReduce distributed framework. designed scale across resources sourced from cloud providers. We present design implementation followed by evaluation with derived both traditional well metagenomic studies. Our experiments show method approximates quality results while improving scalability search.