Masked Video Distillation (CVPR 2023)

Official PyTorch implementation of "Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning".

News

[2023.5.21] Pretrained models have been released in MODEL_ZOO.md.

[2023.4.9] Code of MVD is available now!

[2023.2.28] MVD is accepted by CVPR 2023.

Main Results

Something-Something V2

Method	Pretrain Video Data	Backbone	Teacher	Epoch	Top-1	Top-5	resolution	#Frames x Clips x Crops	Param
MVD	Kinetics-400	ViT-S	ViT-B	400	70.7	92.6	224	16x2x3	22M
MVD	Kinetics-400	ViT-S	ViT-L	400	70.9	92.8	224	16x2x3	22M
MVD	Kinetics-400	ViT-B	ViT-B	400	72.5	93.6	224	16x2x3	87M
MVD	Kinetics-400	ViT-B	ViT-L	400	73.7	94.0	224	16x2x3	87M
MVD	Kinetics-400	ViT-L	ViT-L	400	76.1	95.4	224	16x2x3	305M
MVD	Kinetics-400	ViT-L	ViT-L	800	76.7	95.5	224	16x2x3	305M
MVD	Kinetics-400	ViT-H	ViT-H	800	77.3	95.7	224	16x2x3	633M

Kinetics-400

Method	Pretrain Video Data	Backbone	Teacher	Epoch	Top-1	Top-5	resolution	#Frames x Clips x Crops	Param
MVD	Kinetics-400	ViT-S	ViT-B	400	80.6	94.7	224	16x5x3	22M
MVD	Kinetics-400	ViT-S	ViT-L	400	81.0	94.8	224	16x5x3	22M
MVD	Kinetics-400	ViT-B	ViT-B	400	82.7	95.4	224	16x5x3	87M
MVD	Kinetics-400	ViT-B	ViT-L	400	83.4	95.8	224	16x5x3	87M
MVD	Kinetics-400	ViT-L	ViT-L	400	86.0	96.9	224	16x5x3	305M
MVD	Kinetics-400	ViT-L	ViT-L	800	86.4	97.0	224	16x5x3	305M
MVD	Kinetics-400	ViT-H	ViT-H	800	87.3	97.4	224	16x5x3	633M

AVA v2.2

Method	Pretrain Video Data	Extra Label	Backbone	Teacher	Epoch	mAP	#Frames x Sample Rate	Param
MVD	Kinetics-400	✗	ViT-B	ViT-B	400	29.3	16x4	87M
MVD	Kinetics-400	✓	ViT-B	ViT-B	400	33.6	16x4	87M
MVD	Kinetics-400	✗	ViT-B	ViT-L	400	31.1	16x4	87M
MVD	Kinetics-400	✓	ViT-B	ViT-L	400	34.2	16x4	87M
MVD	Kinetics-400	✗	ViT-L	ViT-L	800	37.7	16x4	305M
MVD	Kinetics-400	✓	ViT-L	ViT-L	800	38.7	16x4	305M
MVD	Kinetics-400	✗	ViT-H	ViT-H	800	40.1	16x4	633M
MVD	Kinetics-400	✓	ViT-H	ViT-H	800	41.1	16x4	633M

UCF101 & HMDB51

Method	Pretrain Video Data	Backbone	Teacher	Epoch	UCF101 Top-1	HMDB51 Top-1
MVD	Kinetics-400	ViT-B	ViT-B	400	97.0	76.4
MVD	Kinetics-400	ViT-B	ViT-L	400	97.5	79.7

Installation

Please follow the instructions in INSTALL.md.

Data Preparation

Please follow the instructions in DATASET.md for data preparation.

Pre-training

The pre-training instruction is in PRETRAIN.md.

Fine-tuning with pre-trained models

The fine-tuning instruction is in FINETUNE.md.

Model Zoo

We provide pre-trained models in MODEL_ZOO.md.

Acknowledgements

This project is built upon MAE and VideoMAE. Thanks to the contributors of these great codebases.

Citation

If this work is helpful for your research, please consider citing MVD.

@inproceedings{wang2022masked,
  title={Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning},
  author={Wang, Rui and Chen, Dongdong and Wu, Zuxuan and Chen, Yinpeng and Dai, Xiyang and Liu, Mengchen and Yuan, Lu and Jiang, Yu-Gang},
  booktitle={CVPR},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
figs		figs
scripts		scripts
DATASET.md		DATASET.md
FINETUNE.md		FINETUNE.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
MODEL_ZOO.md		MODEL_ZOO.md
NOTICE.md		NOTICE.md
PRETRAIN.md		PRETRAIN.md
README.md		README.md
datasets.py		datasets.py
engine_for_finetuning.py		engine_for_finetuning.py
engine_for_pretraining.py		engine_for_pretraining.py
functional.py		functional.py
kinetics.py		kinetics.py
masking_generator.py		masking_generator.py
mixup.py		mixup.py
modeling_finetune.py		modeling_finetune.py
modeling_student.py		modeling_student.py
modeling_teacher.py		modeling_teacher.py
modeling_video_teacher.py		modeling_video_teacher.py
optim_factory.py		optim_factory.py
rand_augment.py		rand_augment.py
random_erasing.py		random_erasing.py
run_class_finetuning.py		run_class_finetuning.py
run_mvd_pretraining.py		run_mvd_pretraining.py
ssv2.py		ssv2.py
transforms.py		transforms.py
utils.py		utils.py
video_transforms.py		video_transforms.py
volume_transforms.py		volume_transforms.py

License

ruiwang2021/mvd

Folders and files

Latest commit

History

Repository files navigation

Masked Video Distillation (CVPR 2023)

News

Main Results

Something-Something V2

Kinetics-400

AVA v2.2

UCF101 & HMDB51

Installation

Data Preparation

Pre-training

Fine-tuning with pre-trained models

Model Zoo

Acknowledgements

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages