作者: Guillermo L Taboada , Carlos Teijeiro , Juan Tourino , Basilio B Fraguela , Ramón Doallo
DOI: 10.1109/HPCC.2009.88
关键词:
摘要: Unified Parallel C (UPC) is an extension of ANSI designed for parallel programming. UPC collective primitives, which are part the standard, increase programming productivity while reducing communication overhead. This paper presents up-to-date performance evaluation two publicly available implementations on three scenarios: shared, distributed, and hybrid shared/distributed memory architectures. The characterization throughput primitives useful increasing through runtime selection appropriate primitive implementation, depends message size architecture, as well to detect inefficient implementations. In fact, based analysis collectives performance, we proposed some optimizations current libraries. We have also compared their MPI counterparts, showing that there room improvement. Finally, this concludes with influence a representative communication-intensive application, optimization highly important scalability.