-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
用llama-factory sft deepctrl的数据集,pyarrow报错,repo主有遇到过这个问题吗 #8
Comments
|
我从该数据集里采样的,并未全部加载该数据集进来,可以参考PR: hiyouga/LLaMA-Factory#3004 |
你好,请问可以分享下你使用lora做finetune的运行参数吗?我也采样了deepctrl500万中文+100万英文的数据微调llama3-7b,目前已经训练了1个epoch,中文的效果还是很差,甚至连结束符都预测不对,十分感谢。 |
finetune了一个epoch,loss在1.2降不下来了:( |
env CUDA_VISIBLE_DEVICES=1,2,3 deepspeed src/train_bash.py \
--stage sft \
--do_train \
--flash_attn \
--template llama3 \
--model_name_or_path Meta-Llama-3-8B \
--dataset other_self_cognition,deepctrl-sft-data_zh,deepctrl-sft-data_en \
--finetuning_type lora \
--use_dora \
--loraplus_lr_ratio 24.0 \
--preprocessing_num_workers 40 \
--lora_rank 16 \
--lora_alpha 32 \
--lora_dropout 0.05 \
--lora_target all \
--output_dir llama3-chinese \
--overwrite_output_dir \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 32 \
--cutoff_len 8192 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 500 \
--eval_steps 500 \
--val_size 1000 \
--save_total_limit 100 \
--logging_first_step True \
--evaluation_strategy steps \
--learning_rate 5e-5 \
--warmup_ratio 0.1 \
--weight_decay 0.05 \
--num_train_epochs 1.0 \
--plot_loss \
--adam_beta1 0.9 \
--adam_beta2 0.95 \
--bf16 \
--cache_dir ./cache \
--report_to tensorboard \
--ddp_find_unused_parameters False \
--deepspeed ./deepspeed_zero_stage2_config.json |
基座模型用的llama3-8b
微调框架使用的是llama-factory
数据集用的deepctrl那个中文数据集,10g+的那个
微调的时候pyarrow报错了,搜了下,有人说datasets库load大数据集的时候会有这个问题,请问repo主是怎么解决的?
The text was updated successfully, but these errors were encountered: