Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In llava/model/multimodal_encoder/timm_clip_encoder.py, add the vit usage in timm , such as siglip([https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384]) and openclip,need to set use_timm_vision_tower as True
add laion data (llava/train/train.py), need to refer laion_path: the path of laion data and laion_amount: the data amount of laion to use
A simple startup for pretrain: (need to download the timm vit to the path of vision_tower and download laion data)
deepspeed llava/train/train_mem.py
--deepspeed ./scripts/zero2.json
--model_name_or_path lmsys/vicuna-13b-v1.5
--version plain
--data_path ./playground/data/LLaVA-Pretrain/blip_laion_cc_sbu_558k.json
--image_folder ./playground/data/LLaVA-Pretrain/images
--vision_tower path-to-timm-vit
--mm_projector_type mlp2x_gelu
--tune_mm_mlp_adapter True
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--bf16 True
--output_dir ./checkpoints/llava-v1.5-13b-pretrain
--num_train_epochs 1
--per_device_train_batch_size 32
--per_device_eval_batch_size 4
--gradient_accumulation_steps 1
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 24000
--save_total_limit 1
--learning_rate 1e-3
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 2048
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to wandb
--use_timm_vision_tower True
--laion_path path-to-laion-data
--laion_amount 1000000