Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE: ADD LISA ALGORITHM] #3103

Closed
wants to merge 16 commits into from
Closed

[FEATURE: ADD LISA ALGORITHM] #3103

wants to merge 16 commits into from

Conversation

qibaoyuan
Copy link

What does this PR do?

NEW FEATURE:
ADD LISA ALGORITHM, SEE: https://arxiv.org/abs/2403.17919

Before submitting

@hiyouga
Copy link
Owner

hiyouga commented Apr 2, 2024

fixes: #3087

@hiyouga hiyouga added the pending This problem is yet to be addressed. label Apr 2, 2024
@hiyouga
Copy link
Owner

hiyouga commented Apr 3, 2024

Takes OptimalScale/LMFlow#726

@yetionyo
Copy link

yetionyo commented Apr 5, 2024

When combining lisa with multiple GPUs, Zero3 and gradient checkpointing, it comes to the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)

return torch._C._nn.flatten_dense_tensors(tensors) single_grad_partition = self.flatten(self.averaged_gradients[sub_group_id]).to...

@qibaoyuan
Copy link
Author

I came up with the code below. The id of optimizer changes when call on_train_epoch_start. BAD THINGS: Still it needs lightning package installed and can only be preformed in a separate project/python file , something like this link: https://lightning.ai/lightning-ai/studios/code-lora-from-scratch .Further updates will be reported.

def on_train_epoch_start(self, trainer: "L.Trainer", pl_module: "pl.LightningModule"):
    if trainer.current_epoch % self.epoch_interval == 0:
        self.switch_active_layers()
        pl_module.optimizer_fn = torch.optim.Adam
        trainer.strategy.setup_optimizers(trainer)

@qibaoyuan
Copy link
Author

截屏2024-04-11 15 44 07

I have conducted experiments on llama2-7b using full, lisa_2, lisa_32 methods. From the image above, you can see that the train loss curve decreases and full is the same as lisa_32.

The latest code borrowed some impl from lmflow and axolotl. Some impl details are purified and debug option is given.

Hope this will be merged.

@neteroster
Copy link

neteroster commented Apr 12, 2024

I tried this and noticed that fine-tuning Qwen/Qwen1.5-0.5B consumes more than 18 GB VRAM with following config. Is this expected?

Config
#!/bin/bash

CUDA_VISIBLE_DEVICES=0 python ../../src/train_bash.py \
    --stage sft \
    --do_train \
    --model_name_or_path Qwen/Qwen1.5-0.5B \
    --dataset mhqg_1k \
    --dataset_dir ../../data \
    --template default \
    --finetuning_type full \
    --use_lisa \
    --lisa_activated_layers 2 \
    --lisa_interval_steps 5 \
    --output_dir ../../saves/Qwen1.5-0.5B/lisa/sft \
    --overwrite_cache \
    --overwrite_output_dir \
    --cutoff_len 3192 \
    --preprocessing_num_workers 16 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --warmup_steps 20 \
    --save_steps 100 \
    --eval_steps 100 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --max_samples 3000 \
    --val_size 0.1 \
    --plot_loss \
    --fp16
System Info
$ uname -a
Linux 6bf7eb606868 5.4.0-152-generic #169-Ubuntu SMP Tue Jun 6 22:23:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A5000               Off | 00000000:81:00.0 Off |                  Off |
| 30%   32C    P0              56W / 230W |      1MiB / 24564MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

@lovekdl
Copy link

lovekdl commented Apr 19, 2024

When I used LISA to fintune Llama-2-7b on alpaca-gpt4-en with one a100 80G,the used memory increased sharply and exceeded 80G. I want to know how to solve this problem...

Config:
CUDA_VISIBLE_DEVICES=2 python src/train_bash.py
--stage sft
--do_train
--model_name_or_path meta-llama/Llama-2-7b-chat-hf
--dataset alpaca_gpt4_en
--dataset_dir data
--template default
--finetuning_type full
--use_lisa 1
--lisa_verbose 1
--lisa_activated_layers 2
--lisa_interval_steps 3
--output_dir saves/Llama-2-7b-chat-lisa-2-3
--overwrite_cache
--overwrite_output_dir
--cutoff_len 1024
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 5
--warmup_steps 0
--save_steps 30000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16

Error:
image

GPU info when running
image

@lovekdl
Copy link

lovekdl commented Apr 23, 2024

@neteroster Hello. I have the same problem with you, have you solved it?

@neteroster
Copy link

@lovekdl Not yet.

@AlexYoung757
Copy link

截屏2024-04-11 15 44 07 I have conducted experiments on llama2-7b using full, lisa_2, lisa_32 methods. From the image above, you can see that the train loss curve decreases and full is the same as lisa_32.

The latest code borrowed some impl from lmflow and axolotl. Some impl details are purified and debug option is given.

Hope this will be merged.

when merge this pr

@qibaoyuan qibaoyuan closed this by deleting the head repository May 10, 2024
@hiyouga hiyouga removed the pending This problem is yet to be addressed. label May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants