[FEATURE: ADD LISA ALGORITHM] #3103

qibaoyuan · 2024-04-02T10:43:19Z

What does this PR do?

NEW FEATURE:
ADD LISA ALGORITHM, SEE: https://arxiv.org/abs/2403.17919

Before submitting

Did you read the contributor guideline?

hiyouga · 2024-04-02T14:52:58Z

fixes: #3087

hiyouga · 2024-04-03T06:20:42Z

Takes OptimalScale/LMFlow#726

yetionyo · 2024-04-05T14:40:04Z

When combining lisa with multiple GPUs, Zero3 and gradient checkpointing, it comes to the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)

return torch._C._nn.flatten_dense_tensors(tensors) single_grad_partition = self.flatten(self.averaged_gradients[sub_group_id]).to...

qibaoyuan · 2024-04-07T06:46:23Z

I came up with the code below. The id of optimizer changes when call on_train_epoch_start. BAD THINGS: Still it needs lightning package installed and can only be preformed in a separate project/python file , something like this link: https://lightning.ai/lightning-ai/studios/code-lora-from-scratch .Further updates will be reported.

def on_train_epoch_start(self, trainer: "L.Trainer", pl_module: "pl.LightningModule"):
    if trainer.current_epoch % self.epoch_interval == 0:
        self.switch_active_layers()
        pl_module.optimizer_fn = torch.optim.Adam
        trainer.strategy.setup_optimizers(trainer)

qibaoyuan · 2024-04-11T09:20:52Z

I have conducted experiments on llama2-7b using full, lisa_2, lisa_32 methods. From the image above, you can see that the train loss curve decreases and full is the same as lisa_32.

The latest code borrowed some impl from lmflow and axolotl. Some impl details are purified and debug option is given.

Hope this will be merged.

neteroster · 2024-04-12T05:25:14Z

I tried this and noticed that fine-tuning Qwen/Qwen1.5-0.5B consumes more than 18 GB VRAM with following config. Is this expected?

Config

#!/bin/bash

CUDA_VISIBLE_DEVICES=0 python ../../src/train_bash.py \
    --stage sft \
    --do_train \
    --model_name_or_path Qwen/Qwen1.5-0.5B \
    --dataset mhqg_1k \
    --dataset_dir ../../data \
    --template default \
    --finetuning_type full \
    --use_lisa \
    --lisa_activated_layers 2 \
    --lisa_interval_steps 5 \
    --output_dir ../../saves/Qwen1.5-0.5B/lisa/sft \
    --overwrite_cache \
    --overwrite_output_dir \
    --cutoff_len 3192 \
    --preprocessing_num_workers 16 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --warmup_steps 20 \
    --save_steps 100 \
    --eval_steps 100 \
    --evaluation_strategy steps \
    --load_best_model_at_end \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --max_samples 3000 \
    --val_size 0.1 \
    --plot_loss \
    --fp16

System Info

$ uname -a
Linux 6bf7eb606868 5.4.0-152-generic #169-Ubuntu SMP Tue Jun 6 22:23:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A5000               Off | 00000000:81:00.0 Off |                  Off |
| 30%   32C    P0              56W / 230W |      1MiB / 24564MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

lovekdl · 2024-04-19T16:37:20Z

When I used LISA to fintune Llama-2-7b on alpaca-gpt4-en with one a100 80G，the used memory increased sharply and exceeded 80G. I want to know how to solve this problem...

Config：
CUDA_VISIBLE_DEVICES=2 python src/train_bash.py
--stage sft
--do_train
--model_name_or_path meta-llama/Llama-2-7b-chat-hf
--dataset alpaca_gpt4_en
--dataset_dir data
--template default
--finetuning_type full
--use_lisa 1
--lisa_verbose 1
--lisa_activated_layers 2
--lisa_interval_steps 3
--output_dir saves/Llama-2-7b-chat-lisa-2-3
--overwrite_cache
--overwrite_output_dir
--cutoff_len 1024
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 5
--warmup_steps 0
--save_steps 30000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16

Error：

GPU info when running

lovekdl · 2024-04-23T09:29:30Z

@neteroster Hello. I have the same problem with you, have you solved it？

neteroster · 2024-04-23T15:54:51Z

@lovekdl Not yet.

AlexYoung757 · 2024-05-08T06:17:04Z

I have conducted experiments on llama2-7b using full, lisa_2, lisa_32 methods. From the image above, you can see that the train loss curve decreases and full is the same as lisa_32.
The latest code borrowed some impl from lmflow and axolotl. Some impl details are purified and debug option is given.

Hope this will be merged.

when merge this pr

齐保元 added 2 commits April 2, 2024 18:41

[FEATURE: ADD LISA ALGORITHM]

8b7d23f

[FEATURE:Lisa] FIX lisa_attention_name

bc75ba1

hiyouga added the pending This problem is yet to be addressed. label Apr 2, 2024

[LISA-BUGFIX] see: OptimalScale/LMFlow#726

39d6c89

齐保元 added 12 commits April 9, 2024 11:46

[LISA-BUGFIX] add post callback of lisa alg

bcda474

[LISA-BUGFIX] lisa opt

e43896f

[LISA-BUGFIX] lisa opt

f084c87

[LISA-BUGFIX] lisa opt

5a9f9a2

[LISA-BUGFIX] lisa opt

a23ea92

[LISA-BUGFIX] lisa opt

da66db8

[LISA-BUGFIX] lisa opt

152d07a

[LISA-BUGFIX] lisa opt

117854c

[LISA-BUGFIX] lisa opt

772d74e

[LISA-BUGFIX] lisa opt

6264461

[LISA-BUGFIX] lisa opt

5c91a70

[LISA-BUGFIX] sort layers

b9af5a3

[LISA-BUGFIX] cli bugfix; lisa only applied to full-param tuning

92e74e0

qibaoyuan closed this by deleting the head repository May 10, 2024

hiyouga removed the pending This problem is yet to be addressed. label May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE: ADD LISA ALGORITHM] #3103

[FEATURE: ADD LISA ALGORITHM] #3103

qibaoyuan commented Apr 2, 2024

hiyouga commented Apr 2, 2024

hiyouga commented Apr 3, 2024

yetionyo commented Apr 5, 2024 •

edited

qibaoyuan commented Apr 7, 2024

qibaoyuan commented Apr 11, 2024

neteroster commented Apr 12, 2024 •

edited

lovekdl commented Apr 19, 2024

lovekdl commented Apr 23, 2024

neteroster commented Apr 23, 2024

AlexYoung757 commented May 8, 2024

[FEATURE: ADD LISA ALGORITHM] #3103

[FEATURE: ADD LISA ALGORITHM] #3103

Conversation

qibaoyuan commented Apr 2, 2024

What does this PR do?

Before submitting

hiyouga commented Apr 2, 2024

hiyouga commented Apr 3, 2024

yetionyo commented Apr 5, 2024 • edited

qibaoyuan commented Apr 7, 2024

qibaoyuan commented Apr 11, 2024

neteroster commented Apr 12, 2024 • edited

lovekdl commented Apr 19, 2024

lovekdl commented Apr 23, 2024

neteroster commented Apr 23, 2024

AlexYoung757 commented May 8, 2024

yetionyo commented Apr 5, 2024 •

edited

neteroster commented Apr 12, 2024 •

edited