-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Able to merge 1.5B model, but unable to run eval #50
Comments
The Also, set |
Also, I assumed Lastly, just for info, my packages:
|
I guess it's due to |
The 1.5B model used TinyLLaMA as its backbone. Why did you include |
Yes, my bad. Honestly, it was an ignorance from my end. So I re-trained using this script:
And then merged using:
But while running the eval(run_tiny_llava.py) I encountered a series of errors... ... all of which were resolved by copy-pasting files from the finetuned model to the merged model. Is this approach incorrect? |
As per the instructions, we were able to merge the base model and finetuned model. But on running eval we get this error:
But we do not encounter the above error when we directly run the unmerged model. Why? Is this the right way?
training script:
deepspeed tinyllava/train/train.py
--deepspeed ./scripts/tiny_llava/zero3.json
--lora_enable True --lora_r 32 --lora_alpha 64
--model_name_or_path bczhou/TinyLLaVA-1.5B
--version phi
--data_path $DATA_PATH
--image_folder $IMAGE_PATH
--vision_tower bczhou/TinyLLaVA-1.5B-SigLIP
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--image_aspect_ratio pad
--group_by_modality_length False
--fp16 True
--output_dir $OUTPUT_DIR
--num_train_epochs 3
--per_device_train_batch_size 8
--per_device_eval_batch_size 4
--gradient_accumulation_steps 2
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 50000
--save_total_limit 1
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 False
--model_max_length 3072
--gradient_checkpointing True
--dataloader_num_workers 15
--lazy_preprocess True
--report_to wandb \
The text was updated successfully, but these errors were encountered: