Variegated Machine Unlearning - KowalskiAnalysis
In the age of machine learning, with the widespread use and abuse of data, to tackle these challenges, the notion of machine unlearning comes into play, with a variety of uses in privacy, poison removal, etc.
Data poisoning is a type of cyber-attack in which an adversary intentionally compromises a training dataset used by an AI or machine learning (ML) model to influence or manipulate the operation of that model.
We take inspiration from the Corrective Machine Unlearning paper by Goel, et. al, to explore some interesting cases with performance implications for machine unlearning.
We explored multiple problems:
1. Machine Unlearning of Poisons over Imbalanced Datasets: Given the inherent disparity in representation of different classes over imbalanced datasets, we hypothesize that the impact of machine unlearning of poisons over such datasets should be equally significant - a poorly represented class, ridden with poisons would greatly benefit from unlearning of the poison.
2. Machine Unlearning over Poisons considering False-Poisons: Algorithms are not perfect, and it may so happen that only a fraction of the total poisoned samples is identified - this problem is explored in the aforementioned paper. We further explore the case of false-poison detection, what happens when algorithms also falsely report samples as poisons.
How did we do it?
- False Poison Detection:
- Dataset: The dataset used is CIFAR10 which consists of images from 10 different classes.
- Model: Resnet9
- If Sf is the size of the deletion set and Cf is the fraction of the clean images that needs to be deleted along with poisoned images, then (Sf – Cf * size of clean dataset) poisoned images and (Cf * size of clean dataset) clean images will be deleted in total. Here the value of Cf could be selected as 0.2, 0.3 and 0.4. The Images are selected randomly from the dataset.
- Class Imbalance:
- We consider the problem of class imbalance in datasets and the impact of machine unlearning of poisons under such settings.
- Dataset: CIFAR10 (only cats/dogs), we choose cats and dogs for their similarity (and subsequent difficulty to distinguish) relative to say, cats/ships or dogs/trucks, etc.
- Model: ResNet9 (custom), we choose a relatively small model as our dataset is small in size (<=10k images in training).
- Training is done on a dataset with 5k dog images with fewer cat images covering varying degrees of imbalance, with cat: dog images in the ratios 0.2, 0.4, 0.7, 1.0, i.e., 1k, 2k, 3.5k, 5k cat images respectively.
- Poisoning: We consider a simple poison where we apply a patch of size 3x3 to the bottom-right/top-left corner of the image respectively for cat/dog images. The simple nature of the poison allows for the model to easily learn shortcuts.
- We poison 100 images from the train dataset in total. We vary the fraction of poisoned cat/dog samples between 0.2, and 0.9, i.e., either 20 cat samples are poisoned, or 90; this allows us to study the performance of unlearning methods relative to the degree of poisoning of the smaller class (here, cat). Poisoned images’ labels are swapped, i.e., cat->dog and vice-versa.

Sample poisoned cat/dog images
- (Images for reference purposes only and not part of the actual dataset, CIFAR10 images are 32x32 in dimension)
- Unlearning: we explore two tried and tested unlearning methods, Catastrophic Forgetting, and Exact Unlearning. Overall, we observe that across experiments, Exact Unlearning outperforms Catastrophic Forgetting, which agrees with the existing understanding of the two methods.
- Test: We create a test dataset with 1k cat & 1k dog images, the distinction to not have imbalance in the test dataset is important as that allows accuracy to better capture the model’s performance on the smaller class (as it gets 50% representation in test, thus poor performance on the smaller class would proportionally impact test performance).
- We perform evaluation on two sets of test datasets - one, as discussed above without any poisoning of test images, another, an adversarially generated test dataset where every image is poisoned, this helps us evaluate performance against the adversarial case.
Results
- Effect of False Positive on Accuracy:
- Effect of Unlearning Poisons over Class Imbalance:
Observations
Team Members
Links
- GitHub Repository:
- Report Link: KowalskiAnalysis Report
References
Video




Comments
Post a Comment