作者: George Teodoro , Timothy D. R. Hartley , Umit Catalyurek , Renato Ferreira
关键词:
摘要: The increases in multi-core processor parallelism and the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements parallel system design remains an open problem. Moreover, most current tools for development applications hierarchical concentrate on use only a single type (e.g., accelerators) do not coordinate several processors. Here, we show that making all resources can significantly improve application performance. Our approach, which consists optimizing at run-time by efficiently coordinating task execution available processing units is evaluated context replicated dataflow applications. proposed techniques were developed implemented integrated targeting both intra- inter-node parallelism. experimental results with real-world complex biomedical our approach nearly doubles performance GPU-only implementation distributed cluster.