作者: Jichan Chung , Kangwook Lee , Ramtin Pedarsani , Dimitris Papailiopoulos , Kannan Ramchandran
DOI:
关键词:
摘要: Modern large-scale learning algorithms are deployed on hundreds of distributed compute instances, each computing gradient updates on a subset of the training data. It has been empirically observed that these algorithms can offer better statistical performance when the training data is shuffled once every few epochs. However, data shuffling is often avoided due to its heavy communication costs. Recently, coding-theoretic ideas have been proposed to minimize the communication cost of shuffling. In this work, we implement UberShuffle, a new coded shuffling system. We observe that our shuffling framework for machine learning can achieve significant speed-ups compared to the state of the art. In some cases, the data shuffling time is reduced by about 50%, and the training time is reduced by about 30%.