(Left) Overview of FreqFit. (Right) Performance gains with (left) Imagenet-21K and (right) MoCo.
This repository is heavily based on the official PyTorch implementation of Visual Prompt Tuning (ECCV22)
- 01/2025: Add Filter2DParams, a parameter-reduced version of the original FreqFit. Here, we drop the filter's complex values, reducing half of the parameters, and use this filter to adjust both the real and complex parts of the input's spectrum. We use this filter to generate the results in Tab. ViT-L results
We experiment FreqFit2D and its parameter-reduced ablation Filter2DParams on VTab Natural with ViT-L. More results will be updated.
ViT-L | Cifar100 | Caltech101 | DTD | Flower102 | Pets | SVHN | Sun397 | Mean | Params |
---|---|---|---|---|---|---|---|---|---|
LoRA | 77.2 | 90.6 | 68.6 | 98.8 | 89.1 | 83.4 | 56.2 | 80.6 | 3.5M+0.0M |
FreqFit-LoRA | 86.5 | 90.4 | 68.5 | 98.7 | 89.4 | 83.7 | 56.2 | 81.9 | 3.5M+5.7M |
FreqFit*-LoRA | 86.7 | 90.4 | 68.8 | 98.7 | 89.2 | 83.5 | 56.1 | 81.9 | 3.5M+2.9M |
------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
RLRR | 79.0 | 90.8 | 69.3 | 98.8 | 89.3 | 87.7 | 54.9 | 81.4 | 0.76M+0.0M |
FreqFit*-RLRR | 82.6 | 91.3 | 68.6 | 99.0 | 90.1 | 87.2 | 54.5 | 81.9 | 0.76M+2.9M |
Tab. Vit-L results. FreqFit and its parameter-reduced ablation generalize well with large-scale model. FreqFit* denote the results obtained with Filter2DParams. It is on par with the original FreqFit with half of the parameters.
See env_setup.sh
or assets/freqfit.yml
Please follow the VPT Datasets preperation and VTAB_SETUP.md
Download and place the pre-trained Transformer-based backbones to the pretrained
folder or to
MODEL.MODEL_ROOT
.
Note that, for MoCo v3, different from VPT, we use the self-supervised pre-trained weights.
Once downloaded, modify the pre-trained backbones names MODEL_ZOO
in src/build_vit_backbone.py
accordingly.
Pre-trained Backbone | Pre-trained Objective | Link |
---|---|---|
ViT-B/16 | Supervised | link |
ViT-B/16 | MoCo v3 | link |
ViT-B/16 | MAE | link |
ViT-B/16 | CLIP | link |
ViT-L/16 | Supervised | link |
Configs related to certain PEFT method are listed in src/config/configs.py
. They can also be changed in the run.sh
.
This repo support FreqFit
and Scale-Shift
fine-tuning methods as presented in the paper. To change the supported method, go to run.sh
and change to FREQFIT "freqfit"
or FREQFIT "ssf"
.
-
The code for
FreqFit
method is insrc/models/gfn.py
-
The code for integrate
FreqFit
into PEFT method can be found in the vision transformer backbone of methodsvit.py
, such assrc/models/vit_backbones/vit.py
.
- To add new PEFT methods that are available in HuggingFace. Simply go to
src/models/vit_models.py
...
# add VERA
elif transfer_type == "vera":
from peft import VeraConfig, get_peft_model
"""
https://huggingface.co/docs/peft/en/package_reference/vera
"""
config = VeraConfig(
r=cfg.MODEL.VERA.R,
target_modules=["attn.query", "attn.value", "attn.key", "attn.out", "ffn.fc1", "ffn.fc2"],
vera_dropout =0.1,
bias="vera_only",
modules_to_save=["classifier"],
)
self.enc = get_peft_model(self.enc, config)
for k, p in self.enc.named_parameters():
if "ssf_scale" in k or "ssf_shift" in k or "filter_layer" in k:
p.requires_grad = True
...
In the run.sh
, modify MODEL.TRANSFER_TYPE "vera"
. Refer to HuggingFace for config details.
- To add custom PEFT method, build your custom method, then add it to add the custom method to
src/models/build_vit_backbone.py
andsrc/models/vit_models.py
. Refer toLoRA
atsrc/models/vit_lora/vit_lora.py
and as an example.
Modify the run.sh
as your reference. Then run:
bash run.sh [data_name] [encoder] [batch_size] [base_lr] [wd_lr] [num_tokens] [adapter_ratio] [freqfit/ssf]
For example for the Cifar100
dataset on Imagenet-21k
with LoRA
incorporate with FreqFit
, make sure the MODEL.TRANSFER_TYPE
and other LoRA configs have been set in run.sh
--config-file configs/finetune/cub.yaml \
MODEL.TRANSFER_TYPE "lora" \
MODEL.LORA.RANK "8" \
MODEL.LORA.ALPHA "8" \
Then, execute:
bash run.sh cifar100 sup_vitb16_imagenet21k 64 0.1 0.01 0 0 freqfit
The majority of FreqFiT is licensed under the CC-BY-NC 4.0 license (see LICENSE for details). Portions of the project are available under separate license terms: GitHub - google-research/task_adaptation and huggingface/transformers are licensed under the Apache 2.0 license; Swin-Transformer, ConvNeXt and ViT-pytorch are licensed under the MIT license; and MoCo-v3 and MAE are licensed under the Attribution-NonCommercial 4.0 International license.