摘要: Adam is a widely used optimization method for training deep learning models. It computes individual adaptive rates different parameters. In this paper, we propose generalization of Adam, called Adambs, that allows us to also adapt examples based on their importance in the model's convergence. To achieve this, maintain distribution over all examples, selecting mini-batch each iteration by sampling according distribution, which update using multi-armed bandit algorithm. This ensures are more beneficial model sampled with higher probabilities. We theoretically show Adambs improves convergence rate Adam---$O(\sqrt{\frac{\log n}{T} })$ instead $O(\sqrt{\frac{n}{T}})$ some cases. Experiments various models and datasets demonstrate Adambs's fast practice.