Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory problem of Lisa finetuning #778

Open
lovekdl opened this issue Apr 21, 2024 · 5 comments
Open

Memory problem of Lisa finetuning #778

lovekdl opened this issue Apr 21, 2024 · 5 comments

Comments

@lovekdl
Copy link

lovekdl commented Apr 21, 2024

I tried fine-tuning the llama-2-7b model using LoRa on an RTX3090 with 24GB, where the memory usage was only about 17GB. However, when I used the same configuration on an A100 with 80GB, the memory usage soared over 70GB. I would like to know if this situation is normal and how I can reduce the memory consumption on the A100 80GB GPU.

I encountered the same issue when fine-tuning with Lisa. The memory consumption on the A100 80GB was significantly higher than on the RTX3090 24GB.

Config model_name_or_path=meta-llama/Llama-2-7b-hf dataset_path=data/alpaca-gpt4 output_dir=output_models/finetuned_llama_2_7b_lora_128_batch1

exp_id=finetuned_llama_2_7b_lora_128_batch1
project_dir=$(cd "$(dirname $0)"/..; pwd)
log_dir=${project_dir}/log/${exp_id}
mkdir -p ${output_dir} ${log_dir}
use_flash_attention=0

deepspeed examples/finetune.py
--model_name_or_path ${model_name_or_path}
--dataset_path ${dataset_path}
--output_dir ${output_dir} --overwrite_output_dir
--num_train_epochs 1
--learning_rate 5e-5
--block_size 512
--per_device_train_batch_size 1
--use_lora 1
--deepspeed configs/ds_config_zero2.json
--lora_r 128
--save_aggregated_lora 1
--fp16
--run_name ${exp_id}
--validation_split_percentage 0
--logging_steps 1
--do_train
--use_flash_attention ${use_flash_attention}
--ddp_timeout 72000
--save_steps 500000
--dataloader_num_workers 1
| tee ${log_dir}/train.log
2> ${log_dir}/train.err

GPU info

image

@research4pan
Copy link
Contributor

Thanks for your interest in LMFlow! I just tested the LISA script in 48G memory GPUs, and the memory consumption looks good. We think the mentioned memory-spike problem can be caused by deepspeed, as it will normally pre-allocate memory before training. You may try the original script.

If the problem does not occur again, you can locate the issue by turning off deepspeed offload (--deepspeed configs/ds_config_zero2_no_offload.json), and zero2 (--deepspeed configs/ds_config_zero0_no_offload.json) to see which mechanism causes the issue.

Hope this information can be helpful 😄

@lovekdl
Copy link
Author

lovekdl commented Apr 22, 2024

@research4pan Currently, I use deepspeed + lora for llama-2-7b fine tuning, and memory consumption is normal now.
But when I use lisa without deepspeed, I still have the problem of memory-spike. The GPU memory consumption increases slowly, and reach 65GB after 3000 steps.

script:
model_name_or_path=meta-llama/Llama-2-7b-hf
dataset_path=data/alpaca-gpt4
output_dir=output_models/finetune_lisa
lisa_activated_layers=1
lisa_interval_steps=20

gradient_checkpointing=True
use_flash_attention=0
gradient_accumulation_steps=1
block_size=256
per_device_train_batch_size=1

num_gpu=$(python -c "import torch; print(torch.cuda.device_count())")
ds_config_file=configs/ds_config_zero0_no_offload.json
if [ ${num_gpu} -ge 2 ]; then
ds_config_file=configs/ds_config_zero2_no_offload.json
fi

while [[ $# -ge 1 ]]; do
key="$1"
case ${key} in
-m|--model_name_or_path)
model_name_or_path="$2"
shift
;;
-d|--dataset_path)
dataset_path="$2"
shift
;;
-o|--output_model_path)
output_dir="$2"
shift
;;
--lisa_activated_layers)
lisa_activated_layers="$2"
shift
;;
--lisa_interval_steps)
lisa_interval_steps="$2"
shift
;;
--gradient_checkpointing)
gradient_checkpointing="$2"
shift
;;
--deepspeed)
ds_config_file="$2"
shift
;;
--use_flash_attention)
use_flash_attention="$2"
shift
;;
--gradient_accumulation_steps)
gradient_accumulation_steps="$2"
shift
;;
--block_size)
block_size="$2"
shift
;;
--per_device_train_batch_size|--batch_size)
per_device_train_batch_size="$2"
shift
;;
*)
echo "error: unknown option "${key}"" 1>&2
exit 1
esac
shift
done

exp_id=finetune
project_dir=$(cd "$(dirname $0)"/.; pwd)
log_dir=${project_dir}/log/${exp_id}
mkdir -p ${output_dir} ${log_dir}

python examples/finetune.py
--model_name_or_path ${model_name_or_path}
--dataset_path ${dataset_path}
--output_dir ${output_dir} --overwrite_output_dir
--num_train_epochs 1
--learning_rate 5e-5
--disable_group_texts 1
--block_size ${block_size}
--per_device_train_batch_size ${per_device_train_batch_size}
--bf16
--torch_dtype bfloat16
--run_name finetune
--optim paged_adamw_32bit
--validation_split_percentage 0
--logging_steps 5
--do_train
--ddp_timeout 72000
--save_steps 500000
--dataloader_num_workers 1
--gradient_checkpointing ${gradient_checkpointing}
--use_flash_attention ${use_flash_attention}
--gradient_accumulation_steps ${gradient_accumulation_steps}
--use_lisa 1
--lisa_activated_layers ${lisa_activated_layers}
--lisa_interval_steps ${lisa_interval_steps}
| tee ${log_dir}/train.log
2> ${log_dir}/train.err

Running Information:
d13e1b816b552ffc7a2f92ed241640d
22201a620579aa23cb556939132fea7

@lovekdl
Copy link
Author

lovekdl commented Apr 22, 2024

There might be some optimizer problem, I think.
If I set self.freeze_all_layers() in the init() of class DynamicLayerActivationCallback(TrainerCallback), the memory consumption is normally 17G.

@lovekdl lovekdl changed the title Memory problem Memory problem of Lisa finetuning Apr 22, 2024
@lovekdl
Copy link
Author

lovekdl commented Apr 22, 2024

It seems that each time new layers are activated, the memory consumption may increase, and layers activated again will not increase the memory consumption.

@research4pan
Copy link
Contributor

This seems like a problem related to deepspeed. We are currently implementing a model-parallelism version that reinitializes optimizer state every time, which shall solve this issue as well. Please stay tuned for our latest updates 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants