作者: Aapo Kyrola , Piotr Dollár , Lukasz Wesolowski , Yangqing Jia , Andrew Tulloch
DOI:
关键词:
摘要: … offers a potential solution to this problem by dividing SGD … nontrivial growth in the SGD minibatch size. In this paper, we … loss of accuracy when training with large minibatch sizes up to …