Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用fp8 后微调速度特别慢 #355

Open
lhtpluto opened this issue Jul 15, 2023 · 0 comments
Open

使用fp8 后微调速度特别慢 #355

lhtpluto opened this issue Jul 15, 2023 · 0 comments

Comments

@lhtpluto
Copy link

lhtpluto commented Jul 15, 2023

finetune_moss.py 中修改如下
accelerator = Accelerator(mixed_precision='fp8')

环境用的nvidia的容器 nvcr.io/nvidia/pytorch:23.06-py3
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

因计算卡显存不足,DeepSpeed offload cpu

修改 sft.yaml 如下

command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp8
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false

我设置fp8格式微调后,训练速度变慢,是怎么回事呢?

DeepSpeed v0.9.5
FP8 unittest for H100 by @jomayeri in microsoft/DeepSpeed#3731

难道是DeepSpeed offload cpu 后,cpu不支持fp8导致的? 我的cpu是Intel® Xeon® w9-3495X Processor

@lhtpluto lhtpluto changed the title 使用fp8 后训练速度特别慢 使用fp8 后微调速度特别慢 Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant