Timmvit and laion data #966

yudongliu97 · 2024-01-03T07:30:11Z

In llava/model/multimodal_encoder/timm_clip_encoder.py, add the vit usage in timm , such as siglip([https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384]) and openclip,need to set use_timm_vision_tower as True
add laion data (llava/train/train.py), need to refer laion_path: the path of laion data and laion_amount: the data amount of laion to use

A simple startup for pretrain: (need to download the timm vit to the path of vision_tower and download laion data)

deepspeed llava/train/train_mem.py
--deepspeed ./scripts/zero2.json
--model_name_or_path lmsys/vicuna-13b-v1.5
--version plain
--data_path ./playground/data/LLaVA-Pretrain/blip_laion_cc_sbu_558k.json
--image_folder ./playground/data/LLaVA-Pretrain/images
--vision_tower path-to-timm-vit
--mm_projector_type mlp2x_gelu
--tune_mm_mlp_adapter True
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--bf16 True
--output_dir ./checkpoints/llava-v1.5-13b-pretrain
--num_train_epochs 1
--per_device_train_batch_size 32
--per_device_eval_batch_size 4
--gradient_accumulation_steps 1
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 24000
--save_total_limit 1
--learning_rate 1e-3
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 2048
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to wandb
--use_timm_vision_tower True
--laion_path path-to-laion-data
--laion_amount 1000000

yudongliu97 added 2 commits January 3, 2024 03:54

laion data and timm vit

f7ad580

fix import

84bad0e

haotian-liu self-assigned this Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timmvit and laion data #966

Timmvit and laion data #966

yudongliu97 commented Jan 3, 2024

Timmvit and laion data #966

Are you sure you want to change the base?

Timmvit and laion data #966

Conversation

yudongliu97 commented Jan 3, 2024