Towards realizing the potential of malleable jobs

作者： Abhishek Gupta , Bilge Acun , Osman Sarood , Laxmikant V. Kale

DOI: 10.1109/HIPC.2014.7116905

关键词:

摘要: Malleable jobs are those which can dynamically shrink or expand the number of processors on they executing at runtime in response to an external command. significantly improve system utilization and reduce average time, compared traditional jobs. To realize these benefits, three components critical — adaptive job scheduler, resource manager, parallel system. In this paper, we present a novel mechanism for enabling shrink/expand capability using task migration dynamic load balancing, checkpoint-restart, Linux shared memory. Our technique performs true eliminating need any residual processes, requires little application programmer effort, is fast. Further, establish bidirectional communication channel between manager runtime, asynchronous split-phase scheduling decisions. Performance results Charm++ Stampede supercomputer show efficacy, scalability, benefits our approach. Shrinking from 2k 1k cores takes 16s while 40s. Also, demonstrate utility as well emerging scenarios, e.g., proactive fault tolerance clouds.

参考文章(22)

Sayantan Chakravorty, Celso L. Mendes, Laxmikant V. Kalé, Proactive fault tolerance in MPI applications via task migration ieee international conference on high performance computing data and analytics. pp. 485- 496 ,(2006) , 10.1007/11945918_47

Dror G. Feitelson, Larry Rudolph, Towards Convergence in Job Schedulers for Parallel Supercomputers job scheduling strategies for parallel processing. pp. 1- 26 ,(1996) , 10.1007/BFB0022284

Jan Hungershofer, On the combined scheduling of malleable and rigid jobs symposium on computer architecture and high performance computing. pp. 206- 213 ,(2004) , 10.1109/SBAC-PAD.2004.27

Su-Hui Chiang, Mary K. Vernon, Dynamic vs. Static Quantum-Based Parallel Processor Allocation job scheduling strategies for parallel processing. pp. 200- 223 ,(1996) , 10.1007/BFB0022295

Eric de Sturler, Milind Bhandarkar, L. V. Kalé, Object-Based Adaptive Load Balancing for MPI Programs∗ ,(2000)

Gengbin Zheng, Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing ,(2005)

Milind Bhandarkar, Laxmikant V Kalé, Eric de Sturler, Jay Hoeflinger, Adaptive Load Balancing for MPI Programs international conference on computational science. pp. 108- 117 ,(2001) , 10.1007/3-540-45718-6_13

Dror G. Feitelson, Larry Rudolph, Uwe Schwiegelshohn, Kenneth C. Sevcik, Parkson Wong, Theory and Practice in Parallel Job Scheduling job scheduling strategies for parallel processing. pp. 1- 34 ,(1997) , 10.1007/3-540-63574-2_14

Márcia C. Cera, Yiannis Georgiou, Olivier Richard, Nicolas Maillard, Philippe O. A. Navaux, Supporting malleability in parallel architectures with dynamic CPUSETs mapping and dynamic MPI international conference of distributed computing and networking. ,vol. 5935, pp. 242- 257 ,(2010) , 10.1007/978-3-642-11322-2_26

10.

Richard A. Dutton, Weizhen Mao, Online scheduling of malleable parallel jobs iasted international conference on parallel and distributed computing and systems. pp. 136- 141 ,(2007)

Towards realizing the potential of malleable jobs

来源期刊

我的账户

Towards realizing the potential of malleable jobs

来源期刊

相似文章 10

我的账户