Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Finetuning OneFormer] How to use multiple GPUs #413

Open
EricLe-dev opened this issue Apr 19, 2024 · 0 comments
Open

[Finetuning OneFormer] How to use multiple GPUs #413

EricLe-dev opened this issue Apr 19, 2024 · 0 comments

Comments

@EricLe-dev
Copy link

EricLe-dev commented Apr 19, 2024

Dear @NielsRogge. First and foremost, thank you so much for your fantastic works. I did follow your tutorial and was able to finetune OneFormer. However, when I try to finetune the model on multi GPUs, it did not work.

I did two approaches:

1. Using DataParallel

import torch.nn as nn
# some code the same as your tutorial
processor.image_processor.num_text = model.config.num_queries - model.config.text_encoder_n_ctx

train_dataset = CustomDataset(processor)
train_dataloader = DataLoader(train_dataset, batch_size=1, shuffle=True, num_workers=16)
optimizer = AdamW(model.parameters(), lr=5e-5)

model = nn.DataParallel(model)
device = 'cuda'
model.to(device)
model.train()

for epoch in range(20):  # loop over the dataset multiple times
    for batch in train_dataloader:
        # zero the parameter gradients
        optimizer.zero_grad()
        batch = {k:v.to(device) for k,v in batch.items()}

        # forward pass
        outputs = model(**batch)

        # backward pass + optimize
        loss = outputs.loss
        print("Loss:", loss.item())
        loss.backward()
        optimizer.step()

This code running normally but just only GPU:0 was utilized, the other GPUs do not seems to work.
Here is the result from nvidia-smi while it's running:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.239.06   Driver Version: 470.239.06   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:3B:00.0 Off |                  N/A |
| 55%   58C    P2   196W / 356W |  20651MiB / 24268MiB |     71%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:3C:00.0 Off |                  N/A |
| 59%   57C    P2   121W / 356W |      8MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:5E:00.0 Off |                  N/A |
| 53%   54C    P2   120W / 356W |      8MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:86:00.0 Off |                  N/A |
| 53%   47C    P2   118W / 356W |      8MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce ...  Off  | 00000000:D8:00.0 Off |                  N/A |
| 60%   58C    P2   137W / 356W |      8MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA GeForce ...  Off  | 00000000:D9:00.0 Off |                  N/A |
| 60%   58C    P2   111W / 356W |      8MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2170      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A   2809467      C   python                          20643MiB |
|    1   N/A  N/A      2170      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      2170      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      2170      G   /usr/lib/xorg/Xorg                  4MiB |
|    4   N/A  N/A      2170      G   /usr/lib/xorg/Xorg                  4MiB |
|    5   N/A  N/A      2170      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

2. Using Accelerate
Following this tutorial, I modified the code as following:

processor.image_processor.num_text = model.config.num_queries - model.config.text_encoder_n_ctx

train_dataset = CustomDataset(processor)
# val_dataset = CustomDataset(processor)

train_dataloader = DataLoader(train_dataset, batch_size=1, shuffle=True, num_workers=16)
optimizer = AdamW(model.parameters(), lr=5e-5)


accelerator = Accelerator()
model, optimizer, train_dataloader = accelerator.prepare(model, optimizer, train_dataloader)


model.train()

for epoch in range(20):  # loop over the dataset multiple times
    for batch in train_dataloader:

        # zero the parameter gradients
        optimizer.zero_grad()
        # batch = {k:v.to(device) for k,v in batch.items()}

        # forward pass
        outputs = model(**batch)

        # backward pass + optimize
        loss = outputs.loss
        print("Loss:", loss.item())
        accelerator.backward(loss)
        optimizer.step()

This code was running normally, except only GPU:0 works.

I'm quite sure that I'm missing something here. Can you please point me to the right direction? Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant