|

Towards Intrinsic Adversarial Robustness Through Probabilistic Training.

Researchers

Journal

Modalities

Models

Abstract

Modern deep neural networks have made numerous breakthroughs in real-world applications, yet they remain vulnerable to some imperceptible adversarial perturbations. These tailored perturbations can severely disrupt the inference of current deep learning-based methods and may induce potential security hazards to artificial intelligence applications. So far, adversarial training methods have achieved excellent robustness against various adversarial attacks by involving adversarial examples during the training stage. However, existing methods primarily rely on optimizing injective adversarial examples correspondingly generated from natural examples, ignoring potential adversaries in the adversarial domain. This optimization bias can induce the overfitting of the suboptimal decision boundary, which heavily jeopardizes adversarial robustness. To address this issue, we propose Adversarial Probabilistic Training (APT) to bridge the distribution gap between the natural and adversarial examples via modeling the latent adversarial distribution. Instead of tedious and costly adversary sampling to form the probabilistic domain, we estimate the adversarial distribution parameters in the feature level for efficiency. Moreover, we decouple the distribution alignment based on the adversarial probability model and the original adversarial example. We then devise a novel reweighting mechanism for the distribution alignment by considering the adversarial strength and the domain uncertainty. Extensive experiments demonstrate the superiority of our adversarial probabilistic training method against various types of adversarial attacks in different datasets and scenarios.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *