Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting

[PyTorch] Code for the paper - 'Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting' (CVPR - eLVM 2024). Paper

Includes standard full model, linear probing and parameter efficient strategies like Block Expansion and LoRA for fine-tuning Vision Transformers (ViTs) for image classification.

Requirements

Python 3.8+
pip install -r requirements.txt

Available Datasets

Dataset	`--data.dataset`
CIFAR-10	`cifar10`
CIFAR-100	`cifar100`
Oxford-IIIT Pet Dataset	`pets37`
Oxford Flowers-102	`flowers102`
Food-101	`food101`
Describable Textures Dataset	`dtd`
Image Folder	`custom dataset`

Usage:

config/ contains example configuration files which can be run with:

python main.py fit --config path/to/config

You can either edit the existing config for your own choice of hyperparameters or choose to do it from command line as follows:

python main.py fit --trainer.accelerator gpu --trainer.devices 1 --trainer.precision 16-mixed
--trainer.max_steps 5000 --model.warmup_steps 500 --model.lr 0.01
--trainer.val_check_interval 500 --data.batch_size 128 --data.dataset cifar100

Examples

1. Full Fine-tuning:

To fully fine-tune a ViT-B/16 model on Foods-101 run:

    python main.py fit --config configs/full/food101.yaml

2. Linear Probing:

To train linear probes on top of a ViT-B/16 model on Foods-101 run:

    python main.py fit --config configs/linear/food101.yaml

3. Low-Rank Adaptation (LoRA):

To fine-tuning a ViT-B/16 model using LoRA on Foods-101 run:

    python main.py fit --config configs/lora/food101.yaml

4. Block Expansion:

To fine-tune a ViT-B/16 model using block expansion on Foods-101 run:

    python main.py fit --config configs/block/food101.yaml

Training on a Custom Dataset

To train on a custom dataset first organize the images into Image Folder format. Then set --data.dataset custom, --data.root path/to/custom/dataset and --data.num_classes <num-dataset-classes>.

Evaluate

To evaluate a trained model on its test set, find the path of the saved config file for the checkpoint (eg. output/cifar10/version_0/config.yaml) and run:

python main.py test --ckpt_path path/to/checkpoint --config path/to/config

Note: Make sure the --trainer.precision argument is set to the same level as used during training.

Results

All results are from fine-tuned ViT-B/16 models which were pretrained on ImageNet-21k (--model.model_name vit-b16-224-in21k).

Standard Fine-tuning

Model	# Params	Cifar-100	IN-1k	MEAN	Config
All	85.9 M	88.13	25.24	56.69	Link
Top-3	21.3 M	84.56	74.15	79.36	Link
Linear	76.9 K	80.57	76.11	78.34	Link

LoRA

Model	# Params	Cifar-100	IN-1k	MEAN	Config
r=4	301 K	87.91	66.82	77.37	Link
r=8	448 K	88.27	65.99	77.13	Link
r=16	743 K	87.84	65.06	76.45	Link

Block Expansion

Model	# Params	Cifar-100	IN-1k	MEAN	Config
p=1	7.2 M	82.72	75.75	79.24	Link
p=2	14.3 M	86.70	75.54	81.12	Link
p=3	21.3 M	88.58	74.61	81.60	Link
p=4	28.4 M	89.09	72.28	80.69	Link

Bibtex

You can cite us using the following:

@inproceedings{AkbarianBafghi2024ParameterEF,
  title={Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic Forgetting},
  author={Reza Akbarian Bafghi and Nidhin Harilal and Claire Monteleoni and Maziar Raissi},
  year={2024},
  url={https://api.semanticscholar.org/CorpusID:269430713}
}

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
assets		assets
configs		configs
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
license		license
main.py		main.py
requirements.txt		requirements.txt

License

rezaakb/peft-vit

Folders and files

Latest commit

History

Repository files navigation