This repository contains the code for a course project done in the Computational Intelligence Lab 2024 at ETH Zurich by
Adversarial vulnerability remains a significant challenge for deep neural networks, as inputs manipulated with imperceptible perturbations can induce misclassification. Recent research posits that natural data occupies low-dimensional manifolds, while adversarial samples reside in the ambient space beyond these manifolds. Motivated by this off-manifold hypothesis, we propose and examine a novel defense mechanism that employs manifold-learning normalizing flows (M-Flows) to project input samples onto approximations of the data manifold prior to classification.
We illustrate the underlying principles of our method with a low-dimensional pedagogical example before testing its effectiveness on high-dimensional natural image data. While our method shows promise in principle on low-dimensional data, learning the data manifold proves highly unstable and sensitive to initial conditions. On image data, our method fails to surpass the baseline.
Supplementary animations that elucidate some of the discussed dynamics can be found on https://chrisoffner.github.io/mflow_defence/.
code/two_spirals.ipynb
: Training the Two Spirals classifier.code/attack_spiral_classifier.ipynb
: Adversarial attacks and defense of the Two Spirals classifier. Fig. 1 in the report was created here.code/defense_cases_frequency.ipynb
: Measuring the relative frequency of attack/defense cases (A) - (D) as described in Sec. 3 of the report. Fig. 5. was created here.code/spiral_manifold_projection.ipynb
: Visualisations of the learned on-manifold projection. Animations, Fig. 3, and Fig. 4 from the report were created here.code/attack_cifar10.ipynb
: Demo for attacking CIFAR-10 images.NOTE: For automatic generation see the script
generate_attacked_cifar10py
below.code/manifold_defense_cases_frequency.ipynb
: Measuring the relative frequency of attack/defense cases (A) - (D) on CIFAR10 vs FGSM and PGD was done here.
-
code/two_spirals_utils.py
: Generates the Two Spirals dataset. -
code/generate_attacked_cifar10.py
: Generates datasets of adversarial FGSM and PGD attacks against the CIFAR-10 dataset for the specified perturbation magnitudes. The default parameters correspond to the attacks we performed.Requirements
-
Must be run inside
code/
. -
Trained ResNet-50 classifier checkpoint saved in
models/resnet/resnet50_cifar10.pt
(instructions here).
-
-
code/undefended_resnet_accuracy.py
: Measure the classification accuracy of the ResNet classifier after being attacked.Requirements
-
Must be run inside
code/
. -
Trained ResNet-50 classifier checkpoint saved in
models/resnet/resnet50_cifar10.pt
. -
Attacked datasets generated with
generate_attacked_cifar10.py
.
-
-
code/pixeldefend_cases_frequency.py
: Measures the relative frequency of attack/defense cases for the PixelDefend baseline on CIFAR-10.Requirements
-
Must be run inside
code/
. -
Trained ResNet-50 classifier checkpoint saved in
models/resnet/resnet50_cifar10.pt
. -
Attacked datasets generated with
code/generate_attacked_cifar10.py
. -
Purified datasets with PixelDefend with a defense radius
$\epsilon_\text{def} = 16$ . For each attack<attack>
and attack perturbation magnitude<eps>
, the script expects to find the corresponding purified datasets indata/cifar10_pixeldefend/
as a tarball namedcifar10_<attack>_atkeps_<eps>_defeps_16.tar.gz
. The tarball should contain the dataset as a single file namedcifar10_<attack>_atkeps_<eps>_defeps_16.pt
.
-
-
code/cifar10_manifold.py
: Script used to train M-flow on the CIFAR10 dataset.