作者: Dan Boneh , Nicolas Papernot , Ian J. Goodfellow , Patrick D. McDaniel , Alexey Kurakin
DOI:
关键词:
摘要: Adversarial examples are perturbed inputs designed to fool machine learning models. training injects such into data increase robustness. To scale this technique large datasets, perturbations crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show form adversarial converges degenerate global minimum, wherein small curvature artifacts near points obfuscate The model thus learns generate weak perturbations, rather than defend against strong ones. As result, we find remains vulnerable black-box attacks, where transfer computed on undefended models, as well powerful novel attack escapes non-smooth vicinity input via random step. further introduce Ensemble Training, augments with transferred from other On ImageNet, Training yields models robustness attacks. In particular, our most robust won first round NIPS 2017 competition Defenses Attacks. However, subsequent work found more elaborate attacks could significantly enhance transferability and reduce accuracy