Could you please share some tips with your rich experience? #3452

xiaochengsky · 2024-04-25T14:23:38Z

Reminder

I have read the README and searched the existing issues.

Reproduction

It's an awesome project! Thank you wonderful contributions!

For an example repo about stf by using deepspeed:

deepspeed --num_gpus=8 src/train_bash.py
--stage sft
--model_name_or_path "xxx"
--do_train
--dataset alpaca_en
--dataset_dir ./data
--finetuning_type lora
--output_dir "xxx"
--overwrite_cache
--per_device_train_batch_size 16
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 3
--plot_loss
--fp16
--template default
--deepspeed "scripts/ds_z3_config_lora.json"

Here some questions in multi-gpus finetuning:

Does the learning rate need to be linear scaled accordingly depending on the number of gpu's and per_device_train_batch_size?
e.g. now gpus=8, per_device_train_batch_size=16, lr=5e-5. So, if gpus=4, per_device_train_batch_size=4, lr~6.25e-6, right?
Based on your rich experience, for NLP general tasks(e.g ARC-c/ARC-e/BoolQ/HellaSwag/MMLU/OBQA/RTE/WinoGrande, and so on ), how much loss reduction is considered good(like low than 1? for alpaca_en)?
If the training loss is reduced, is it good for performing well on NLP general tasks?
For base models(like Mixtral-8x7B, not Mixtral-8x7B Instruct), will it affect their zero-shot performance on NLP general tasks by using different template(default/alpaca/vicuna)?

I know you are very busy, but I still looking forward to your reply, thanks!

Expected behavior

No response

System Info

No response

Others

No response

xiaochengsky · 2024-04-25T20:32:36Z

Maybe I should update the first question.

Does the learning rate need to be linear scaled accordingly depending on the number of gpu's and gradient_accumulation_steps (maybe per_device_train_batch_size isn't so cirtical? right)

hiyouga added the pending This problem is yet to be addressed. label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you please share some tips with your rich experience? #3452

Could you please share some tips with your rich experience? #3452

xiaochengsky commented Apr 25, 2024

xiaochengsky commented Apr 25, 2024

Could you please share some tips with your rich experience? #3452

Could you please share some tips with your rich experience? #3452

Comments

xiaochengsky commented Apr 25, 2024

Reminder

Reproduction

Expected behavior

System Info

Others

xiaochengsky commented Apr 25, 2024