作者: Viveck R. Cadambe , Pulkit Grover , Sanghamitra Dutta
DOI:
关键词:
摘要: Faced with saturation of Moore's law and increasing size dimension data, system designers have increasingly resorted to parallel distributed computing reduce computation time machine-learning algorithms. However, is often bottle necked by a small fraction slow processors called "stragglers" that the speed because fusion node has wait for all complete their processing. To combat effect stragglers, recent literature proposes introducing redundancy in computations across processors, e.g., using repetition-based strategies or erasure codes. The can exploit this completing outputs from only subset ignoring stragglers. In paper, we propose novel technique - call "Short-Dot" introduce redundant coding theory inspired fashion, linear transforms long vectors. Instead dot products as required original transform, construct larger number short be computed more efficiently at individual processors. Further, these are finish successfully. We demonstrate through probabilistic analysis well experiments on clusters Short-Dot offers significant speed-up compared existing techniques. also derive trade-offs between length dot-products resilience stragglers (number finish), any such strategy compare it achieved our strategy.