Variegated Machine Unlearning

Poster Presentation Day

What is ‘Machine Unlearning’?

In the age of machine learning, with the widespread use and abuse of data, to tackle these challenges, the notion of machine unlearning comes into play, with a variety of uses in privacy, poison removal, etc.

What is ‘Data Poisoning’?

Data poisoning is a type of cyber-attack in which an adversary intentionally compromises a training dataset used by an AI or machine learning (ML) model to influence or manipulate the operation of that model.

What is the motivation?

We take inspiration from the Corrective Machine Unlearning paper by Goel, et. al, to explore some interesting cases with performance implications for machine unlearning.

What did we do?

We explored multiple problems:

1. Machine Unlearning of Poisons over Imbalanced Datasets: Given the inherent disparity in representation of different classes over imbalanced datasets, we hypothesize that the impact of machine unlearning of poisons over such datasets should be equally significant - a poorly represented class, ridden with poisons would greatly benefit from unlearning of the poison.

2. Machine Unlearning over Poisons considering False-Poisons: Algorithms are not perfect, and it may so happen that only a fraction of the total poisoned samples is identified - this problem is explored in the aforementioned paper. We further explore the case of false-poison detection, what happens when algorithms also falsely report samples as poisons.

How did we do it?

False Poison Detection:

Dataset: The dataset used is CIFAR10 which consists of images from 10 different classes.

Model: Resnet9

If Sf is the size of the deletion set and Cf is the fraction of the clean images that needs to be deleted along with poisoned images, then (Sf – Cf * size of clean dataset) poisoned images and (Cf * size of clean dataset) clean images will be deleted in total. Here the value of Cf could be selected as 0.2, 0.3 and 0.4. The Images are selected randomly from the dataset.

Class Imbalance:

We consider the problem of class imbalance in datasets and the impact of machine unlearning of poisons under such settings.

Dataset: CIFAR10 (only cats/dogs), we choose cats and dogs for their similarity (and subsequent difficulty to distinguish) relative to say, cats/ships or dogs/trucks, etc.

Model: ResNet9 (custom), we choose a relatively small model as our dataset is small in size (<=10k images in training).

Training is done on a dataset with 5k dog images with fewer cat images covering varying degrees of imbalance, with cat: dog images in the ratios 0.2, 0.4, 0.7, 1.0, i.e., 1k, 2k, 3.5k, 5k cat images respectively.

Poisoning: We consider a simple poison where we apply a patch of size 3x3 to the bottom-right/top-left corner of the image respectively for cat/dog images. The simple nature of the poison allows for the model to easily learn shortcuts.

We poison 100 images from the train dataset in total. We vary the fraction of poisoned cat/dog samples between 0.2, and 0.9, i.e., either 20 cat samples are poisoned, or 90; this allows us to study the performance of unlearning methods relative to the degree of poisoning of the smaller class (here, cat). Poisoned images’ labels are swapped, i.e., cat->dog and vice-versa.

Sample poisoned cat/dog images

(Images for reference purposes only and not part of the actual dataset, CIFAR10 images are 32x32 in dimension)

Unlearning: we explore two tried and tested unlearning methods, Catastrophic Forgetting, and Exact Unlearning. Overall, we observe that across experiments, Exact Unlearning outperforms Catastrophic Forgetting, which agrees with the existing understanding of the two methods.

Test: We create a test dataset with 1k cat & 1k dog images, the distinction to not have imbalance in the test dataset is important as that allows accuracy to better capture the model’s performance on the smaller class (as it gets 50% representation in test, thus poor performance on the smaller class would proportionally impact test performance).

We perform evaluation on two sets of test datasets - one, as discussed above without any poisoning of test images, another, an adversarially generated test dataset where every image is poisoned, this helps us evaluate performance against the adversarial case.

Results

Effect of False Positive on Accuracy:

Effect of Unlearning Poisons over Class Imbalance:

Variegated Machine Unlearning - Class Imbalance(Overall results)

Deletion of 30 poisoned samples out of 100

Deletion of 90 poisoned samples out of 100

patch_size: dimension of poison patch (3x3 here)

cat_ratio: ratio of cat samples in training to dog samples (ex. 0.2-> 200 cat samples against 1000 dog samples)

cat_poisoning_ratio: ratio of cat samples among the 100 poisoned samples (0.2 -> 20 cat & 80 dog samples poisoned)

unlearn_method: CF/EU (Catastrophic Forgetting, Exact Unlearning)

test_acc: model accuracy on adversarial test dataset after unlearning

test_clean_acc: model accuracy on (poison free/clean) test dataset after unlearning

Observations

In case of removal of false poison samples from the dataset, we observed that up to 20% of removal of the clean dataset the model accuracy is around 88% but if the fraction is increased up to 30% the model accuracy is dropped significantly to 34% and is getting reduced further.

In case of class imbalance, we observe that performance over the adversarial test set improves upon unlearning poison (greater improvement when removing higher number of poisoned samples), performance is worse with greater degree of imbalance; while the aforementioned results match expectation, it is interesting to see that when the degree of imbalance is high (cat_fraction = 0.2) a larger fraction of the smaller class’s samples is poisoned, unlearning gives better results. This is contrary to expectation as it was believed that a higher degree of poisoning of the smaller class should lead to worse performance because the smaller class has an equal representation in the test dataset and adverse performance over the same would lead to poorer performance overall.

The poisoning used is straightforward and easily confuses the model acting as a trojan; while model performance is mostly consistent on clean data, when unlearning, we find that for high degree of imbalance, greater poisoning of the smaller class leads to better overall accuracy on the adversarial test dataset - we hypothesize that this is likely due to the model performing poorly on the smaller class regardless in such cases; with greater amounts of poison in the smaller class, amount of poison in the larger class is lesser and thus the poisoning has a lower impact on the larger class, which the model performs better on, and thus, overall, after unlearning, the model has better performance on the larger class, whereas performance on the smaller class remains almost similarly poor irrespective of the degree of poisoning.

Also, as the degree of imbalance reduces (say, cat_fraction = 0.7), we see that the impact of the poisoning ratio on performance changes to match our expectations, where a larger degree of poisoning of the smaller class leads to worse overall performance. This agrees with the above hypothesis as well – as, after a certain degree of imbalance, the assumption that the model performs similarly poor on the smaller class irrespective of poisoning fraction is not expected to hold and thus, poisoning of the smaller class has higher effect on model performance.

Team Members

Warul Kumar Sinha (2023201045)

Yatharth Agarwal (2023201062)

Vivek Ram (2023201055)

Nilesh Keshwani (2023201077)

Links

GitHub Repository:

Report Link: KowalskiAnalysis Report

References

1. Corrective Machine Unlearning by Shashwat Goel, Ameya Prabhu, Philip Torr, Ponnurangam Kumaraguru, Amartya Sanyal [arxiv:2402.14015]

Video

Search This Blog

RSAI Projects - To be updated later

Variegated Machine Unlearning - KowalskiAnalysis

Comments

Post a Comment

Popular posts from this blog

Data when seen through the Inference Web of LLM

EAR-VM: Exploring Methods for Improving Adversarial Robustness of Vision Models

Exploring and Quantifying Bias in VLMs