Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsdp-qlora yi-34B-chat throw error " ValueError: Cannot flatten integer dtype tensors" #3470

Closed
1 task done
hellostronger opened this issue Apr 26, 2024 · 6 comments
Closed
1 task done
Labels
solved This problem has been already solved.

Comments

@hellostronger
Copy link

Reminder

  • I have read the README and searched the existing issues.

Reproduction

CUDA_VISIBLE_DEVICES=0,1 accelerate launch
--config_file config.yaml
src/train_bash.py
--stage sft
--do_train
--model_name_or_path /workspace/models/Yi-34B-Chat
--dataset law_with_basis
--dataset_dir data
--template default
--finetuning_type lora
--lora_target q_proj,v_proj
--output_dir /workspace/ckpt/Yi-34B-Chat-sft
--overwrite_cache
--overwrite_output_dir
--cutoff_len 1024
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 100
--eval_steps 100
--evaluation_strategy steps
--load_best_model_at_end
--learning_rate 5e-5
--num_train_epochs 3.0
--max_samples 3000
--val_size 0.1
--quantization_bit 4
--plot_loss
--fp16

config.yaml

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: true
  fsdp_forward_prefetch: false
  fsdp_offload_params: true
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: false
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Expected behavior

fsdp qlora yi-34B-chat

System Info

transformers 4.39.3 torch 2.1.2 cuda 121 python3.8

Others

image
image
image

@hellostronger
Copy link
Author

hellostronger commented Apr 26, 2024

have seen exist issue written in March,but i cannot get any useful info to find out why this error came,hoping your suggestion

@hiyouga
Copy link
Owner

hiyouga commented Apr 26, 2024

please provide your version of accelerate and bitsandbytes

@hiyouga hiyouga added the pending This problem is yet to be addressed. label Apr 26, 2024
@hellostronger
Copy link
Author

hellostronger commented Apr 28, 2024

@hiyouga accelerate==0.28.0 bitsandbytes==0.43.0 ,Do these versions have any problems?hoping your suggestion

@hiyouga
Copy link
Owner

hiyouga commented Apr 28, 2024

did you use the latest code?

@etemiz
Copy link

etemiz commented Apr 29, 2024

While I am trying to train https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-70b I am getting the same error "ValueError: Cannot flatten integer dtype tensors".
The error seems to be resolved when I reinstalled LLaMA-Factory again. These are the versions:

accelerate 0.29.3
bitsandbytes 0.43.1

@hellostronger
Copy link
Author

@hiyouga sorry,my answer is so late this case, using newest llama_factory code, it work currently right now

@hiyouga hiyouga added solved This problem has been already solved. and removed pending This problem is yet to be addressed. labels May 4, 2024
@hiyouga hiyouga closed this as completed May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

3 participants