Add multi-gpu support #5997

lstein · 2024-03-20T03:31:09Z

Summary

This adds support for systems that have multiple GPUs. On CUDA systems, it will automatically detect when a system has more than one GPU and configure the model cache and the session processor to take advantage of them, keeping track of which GPUs are busy and which are available, and rendering batches of images in parallel. It works at the session processor level by placing each session into a thread-safe queue that is monitored by multiple threads. Each thread reserves a GPU at entry, processes the entire invocation, and then releases the GPU to be used by other pending requests.

Demo

cinnamon-2024-04-16T152651-0400.webm

How it works

In addition to changes in the session processor, this PR adds a few calls to the model manager's RAM cache to reserve and release GPUs in a thread-safe way, and extends the TorchDevice class to support dynamic device selection without changing its API. The PR also improves how models are moved from RAM to VRAM to increase load speed modestly. During debugging, I discovered that uuid.uuid4() does not appear to be thread-safe on Windows platforms (https://stackoverflow.com/questions/2759644/python-multiprocessing-doesnt-play-nicely-with-uuid-uuid4), and this was borking the latent caching system. I worked around this by adding the current thread ID to the cache object's name.

There are two new options for the config file:

max_threads -- specify the maximum number of session processing threads that can run at the same time. If not defined, will set this equal to the number of GPU devices.
devices -- a list of devices to use for acceleration. If not defined, this will be dynamically calculated to use all CUDA GPUs found.

Example:

max_threads: 3
devices:
  - cuda:0
  - cuda:1
  - cuda:4

Note that there is no problem if max_threads does not match the number of GPU devices (even on single-GPU systems), but there won't be any benefit to defining more threads than GPUs.

The code is currently tested and working using multiple threads on a 6-GPU Windows machine.

To test

First, buy yourself two RTX 4090s :-).

Seriously, though, the best thing to do is to do ensure that this doesn't crash single-GPU systems. Exercise the linear and graph workflows. Try different models, loras, IP adapters, upscalers, etc. Run a couple large batches and make sure that they can be paused, resumed and cancelled as usual.

If you have access to a system that has an integrated GPU as well as a discrete one, you can test out the multi-GPU processing simply by queueing up a series of 2 or more generation jobs.

QA Instructions

Squash merge when approved.

Merge Plan

Checklist

The PR has a short but descriptive title
Tests added / updated
Documentation added / updated

psychedelicious · 2024-03-20T03:32:55Z

Lincoln, please stop tempting me to buy another RTX 4090.

Should be waiting on the resume event instead of checking it in a loop

Prefer an early return/continue to reduce the indentation of the processor loop. Easier to read. There are other ways to improve its structure but at first glance, they seem to involve changing the logic in scarier ways.

…multi-gpu

… into lstein/feat/multi-gpu

lstein · 2024-04-17T11:04:03Z

I just noticed that the changes to the way VRAM loading is handled is consuming more memory than it should. I’m going to revert to the current method and work on this in a separate PR. (These changes are not related to the multi-GPU support).

…memory

psychedelicious · 2024-05-12T23:10:00Z

While the code changes are not huge, this is still a very substantial change without a way to strictly feature-flag the multi-GPU handling. Properly testing this will require carefully monitored testing on a staging environment. We don't have capacity to do that and I can't give a solid timeline for when we will.

makemefeelgr8 · 2024-05-16T16:50:41Z

@lstein You're my hero! Can you hide it behind a checkbox, a setting or the env variable? Just to merge this feature and prevent @psychedelicious from worrying too much.

psychedelicious · 2024-05-16T19:58:59Z

@makemefeelgr8 Sorry but it's not that simple. This change needs to wait until we can allocate resources to do thorough testing.

add draft multi-gpu support

6b991a5

lstein requested review from blessedcoolant, GreggHelt2, brandonrising, RyanJDick, hipsterusername and psychedelicious as code owners March 20, 2024 03:31

lstein marked this pull request as draft March 20, 2024 03:31

github-actions bot added python PRs that change python files backend PRs that change backend files services PRs that change app services python-tests PRs that change python tests labels Mar 20, 2024

Lincoln Stein added 2 commits March 21, 2024 20:29

Merge branch 'main' into lstein/feat/multi-gpu

24d7328

remove vram_cache and don't move VRAM models back into CPU

eaa2c68

github-actions bot added invocations PRs that change invocations docs PRs that change docs labels Mar 31, 2024

psychedelicious and others added 7 commits April 1, 2024 07:45

fix(nodes): 100% cpu usage when processor paused

bd9b00a

Should be waiting on the resume event instead of checking it in a loop

remove references to vram_cache in tests

a1dcab9

feat(nodes): simplify processor loop with an early continue

32d3e4d

Prefer an early return/continue to reduce the indentation of the processor loop. Easier to read. There are other ways to improve its structure but at first glance, they seem to involve changing the logic in scarier ways.

add locking around thread-critical sections

9336a07

fix merge conflicts

83356ec

Merge branch 'psyche/fix/nodes/processor-cpu-usage' into lstein/feat/…

cef51ad

…multi-gpu

parallel processing working on single-GPU, not tested on multi

9df0980

github-actions bot added the api label Apr 1, 2024

Lincoln Stein added 5 commits April 1, 2024 13:30

added notes

eca29c4

implement session-level reservation of gpus

3d69372

working but filled with debug statements

9adb15f

fix merge issues; likely nonfunctional

7dd93cb

fixup config_default; patch TorchDevice to work dynamically

f7436f3

Lincoln Stein added 4 commits April 15, 2024 22:28

revert object_serializer_forward_cache.py

a84f305

add tid to cache name to avoid non-safe uuid4 on windows

bd83390

make object_serializer._new_name() thread-safe; add max_threads config

fb9b7fb

simplify logic for retrieving execution devices

371f5bc

lstein marked this pull request as ready for review April 16, 2024 20:14

lstein and others added 3 commits April 16, 2024 16:14

Merge branch 'main' into lstein/feat/multi-gpu

77130f1

device selection calls go through TorchDevice

99558de

Merge branch 'lstein/feat/multi-gpu' of github.com:invoke-ai/InvokeAI…

89f8326

… into lstein/feat/multi-gpu

lstein changed the title ~~Add draft multi-gpu support~~ Add multi-gpu support Apr 16, 2024

Lincoln Stein added 5 commits April 16, 2024 16:55

make pause/resume work in multithreaded environment

eaadc55

added more unit tests

763a2e2

fix ValueError on model manager install

d04c880

reverse stupid hack

edac01d

make choose_torch_dtype() usable outside an invocation context

84f5cbd

Lincoln Stein and others added 4 commits April 17, 2024 09:51

revert to old system for doing RAM <-> VRAM transfers; new way leaks …

c3d1252

…memory

Merge branch 'main' into lstein/feat/multi-gpu

1c0067f

Merge branch 'main' into lstein/feat/multi-gpu

e57809e

Merge branch 'main' into lstein/feat/multi-gpu

debef24

lstein mentioned this pull request May 18, 2024

Break apart session processor and the running of each session into se… #6382

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-gpu support #5997

Add multi-gpu support #5997

lstein commented Mar 20, 2024 •

edited

psychedelicious commented Mar 20, 2024

lstein commented Apr 17, 2024

psychedelicious commented May 12, 2024

makemefeelgr8 commented May 16, 2024

psychedelicious commented May 16, 2024

Add multi-gpu support #5997

Are you sure you want to change the base?

Add multi-gpu support #5997

Conversation

lstein commented Mar 20, 2024 • edited

Summary

Demo

How it works

To test

QA Instructions

Merge Plan

Checklist

psychedelicious commented Mar 20, 2024

lstein commented Apr 17, 2024

psychedelicious commented May 12, 2024

makemefeelgr8 commented May 16, 2024

psychedelicious commented May 16, 2024

lstein commented Mar 20, 2024 •

edited