你好，关于我模型训练一天，但进度只有1%，这样的速度正常吗？ #484

tomhas-hub · 2023-08-09T08:21:32Z

我训练的模型是BELLE-7B-0.2M，使用你们示例提供的school_math作为数据集，使用的训练命令如下：
torchrun --nproc_per_node 1 src/entry_point/sft_train.py
--ddp_timeout 36000
--model_name_or_path ${model_name_or_path}
--use_lora
--deepspeed configs/deepspeed_config_stage3.json
--lora_config configs/lora_config_bloom.json
--train_file ${train_file}
--validation_file ${validation_file}
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--num_train_epochs 10
--model_max_length ${cutoff_len}
--save_strategy "steps"
--save_total_limit 3
--learning_rate 3e-4
--weight_decay 0.00001
--warmup_ratio 0.01
--lr_scheduler_type "cosine"
--logging_steps 10
--evaluation_strategy "steps"
--bf16
--seed 1234
--gradient_checkpointing
--cache_dir ${cache_dir}
--output_dir ${output_dir} \

以下为显存占用情况：

请问我该如何加速训练呢？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

你好，关于我模型训练一天，但进度只有1%，这样的速度正常吗？ #484

你好，关于我模型训练一天，但进度只有1%，这样的速度正常吗？ #484

tomhas-hub commented Aug 9, 2023

你好，关于我模型训练一天，但进度只有1%，这样的速度正常吗？ #484

你好，关于我模型训练一天，但进度只有1%，这样的速度正常吗？ #484

Comments

tomhas-hub commented Aug 9, 2023