EAR-VM: Exploring Methods for Improving Adversarial Robustness of Vision Models

 Abstract

CNNs have many uses, particularly in the field of Computer Vision, however, their vulnerability to Adversarial attacks leaves a lot to be desired, particularly to their robustness to these kinds of attacks in order to make them more safe. The misuse of adversarial attacks are a major threat to CNN vision models; for example, self-driving cars can be made to misinterpret road signs or signals, putting the passengers at significant risk of harm. To address this vulnerability, we have attempted to modify the architecture of a CNN to add an auxiliary classification SVM which will determine the maximum margin in which these adversarial attacks will impact loss.  Also, interpretability is a key concept in understanding the decisions and outputs of modern networks. By interpreting the working of the model, we can curate better adversarial attacks, or make the model more robust.

 

Objectives

Our goal is to first implement the At-SVM, and then to test it to see if there is any significant change to the Adversarial robustness of CNNs with or without the auxiliary trained SVM (AT-SVM) .This means testing methods of adversarial attack on CNNs trained both on clean data, as well as with an empirical defense method known as Adversarial Training, a simple empirical method already known to reliably increase the robustness of CNNs. Whilst these defense methods can improve the Adversarial robustness of vision models significantly, it comes at the cost of decreasing overall precision.

 

One of these training methods is known as RST (robust self training). For the interpretability part, we implemented the D-RISE (Detector-Randomized Input Sampling for Explainability) method which is a black-box technique to generate saliency maps for object detectors. It gives us the regions of the image on which the detector focused while giving a certain prediction and helps us understand the working of the model. We then attempt to curate a PGD-like adversarial attack for object detector models which perturbs on only the region-of-interest for a detection.

 

Background and Hypothesis

- Components

For the project, we largely stuck with the ResNet-18 vision model, and used MNIST as a “Simple” dataset and CIFAR-10 as a more complex dataset. For our work on interpretability we focused on the Faster R-CNN model. Support Vector Machines are supervised maximum margin models used for both classification and regression tasks. An SVM classifier, which is our focus in this paper, treats classification on a 𝑑-dimensional dataset as a task of finding the (𝑑 1)-dimensional boundaries that provide the maximal margin for our classes.

 

- Interpretability

In the context of ML models, interpretability can be seen as the degree to which the model’s predictions can be comprehended, which makes sense when you consider that a model’s strength and weaknesses in its classification can be better evaluated when it is known how the model learns from data. 

 

 

For the interpretability aspect, we used D-RISE method which provides a black-box explanation of object detector models via saliency maps. We further curated an adversarial attack for object
detector models based on extracting and perturbing the region-of-interest for a particular detection.

Methods

Attacks and defenses

We started off with an implementation of some industry standard adversarial attacks: these included the FGSM attack and the PGD attack, both targeted and targeted. These attacks were used in many adversarial training methods. The simplest of these is SAT, an basic empirical adversarial defense that uses a small portion of poisoned data to improve the adversarial robustness of a model at the cost of noble reduction in general accuracy.


To counter this, we use RST; RST is a more robust adversarial training method that also guarantees better accuracy on clean data. It achieves this through use of a semi-supervised learning framework, the maxim at play being that having more data to train the model on is better both for adversarial robustness and ensuring clean accuracy, than having limited data but with ground truth in it.

When we use deep learning models for object detection, it's often crucial to understand how these models make their decisions. D-RISE is a method designed to do just that by creating visual explanations, or saliency maps, which highlight the most influential parts of an image that lead to a particular detection.

The process begins with an image where an object has been detected. D-RISE generates multiple random masks that are applied over the image. Each mask varies the visibility of different parts of the image when fed into the object detection model. This helps in identifying which regions of the image are most critical for the model to recognize the object. By overlaying the image with these masks and passing them through the model, D-RISE evaluates how each masked image compares to the actual object detection.

 







\

Results

D-RISE

Attacking the ’horse’ detection
Saliency map for perturbed image
 
Saliency map for multiple images of ’person’ class

RST accuracy plotted against no AT accuracy

Conclusion

    D-RISE and the Attack

The D-RISE method gives multiple insights into the decisions of object detectors. It gives the region of focus for the detection and outperforms other methods like Grad-Cam based on metrics explained in its paper. On perturbing the ROI with our curated attack, we achieve some success in form of change in output.

For ’person’ class, the detector usually focuses on the upper half of the body unleashing a bias in mode 

 

  On perturbing the ROI for the ’horse’ detection, we observe
a slight decrease in confidence score of not only ’horse’ detection, but also a greater decrease in the confidence of background ’person’ detection which does not have any perturbation.

 
The saliency maps show that in the original image detection the focus was on the face and neck of the horse. After perturbation, the focus on horse is still intact but the focus also shifts towards the ’person’ detection in the foreground (on the horse). This explains why the confidence score of background ’person’ detection might have decreased.

 

SAT & RST 

Through standard adversarial training we observe that although the model is now substantially more robust to perturbed data, its accuracy on clean data somewhat suffers.With the semi-supervised learning framework of RST we are able to mitigate that and arrive at a good tradeoff between the adversarial and clean accuracies that perform better than SAT across the board

AT-SVM

While the concept is novel and intriguing, and makes theoretical sense, we could bot replicate the terms of the paper that proposed this architecture ourselves. The paper describing this method has some mathematical inconsistencies in the gradient update step of the SVM auxiliary classifier. While this is a negative result, we believe that this was still an insightful undertaking and the idea is well worth exploring and building on.



Links:

https://youtu.be/H-3Q_45Qpv0

https://files.catbox.moe/4t806g.pdf


Comments

Popular posts from this blog

Data when seen through the Inference Web of LLM

Variegated Machine Unlearning - KowalskiAnalysis