Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-gpu support #5997

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

Add multi-gpu support #5997

wants to merge 31 commits into from

Conversation

lstein
Copy link
Collaborator

@lstein lstein commented Mar 20, 2024

Summary

This adds support for systems that have multiple GPUs. On CUDA systems, it will automatically detect when a system has more than one GPU and configure the model cache and the session processor to take advantage of them, keeping track of which GPUs are busy and which are available, and rendering batches of images in parallel. It works at the session processor level by placing each session into a thread-safe queue that is monitored by multiple threads. Each thread reserves a GPU at entry, processes the entire invocation, and then releases the GPU to be used by other pending requests.

Demo

cinnamon-2024-04-16T152651-0400.webm

How it works

In addition to changes in the session processor, this PR adds a few calls to the model manager's RAM cache to reserve and release GPUs in a thread-safe way, and extends the TorchDevice class to support dynamic device selection without changing its API. The PR also improves how models are moved from RAM to VRAM to increase load speed modestly. During debugging, I discovered that uuid.uuid4() does not appear to be thread-safe on Windows platforms (https://stackoverflow.com/questions/2759644/python-multiprocessing-doesnt-play-nicely-with-uuid-uuid4), and this was borking the latent caching system. I worked around this by adding the current thread ID to the cache object's name.

There are two new options for the config file:

  • max_threads -- specify the maximum number of session processing threads that can run at the same time. If not defined, will set this equal to the number of GPU devices.
  • devices -- a list of devices to use for acceleration. If not defined, this will be dynamically calculated to use all CUDA GPUs found.

Example:

max_threads: 3
devices:
  - cuda:0
  - cuda:1
  - cuda:4

Note that there is no problem if max_threads does not match the number of GPU devices (even on single-GPU systems), but there won't be any benefit to defining more threads than GPUs.

The code is currently tested and working using multiple threads on a 6-GPU Windows machine.

To test

First, buy yourself two RTX 4090s :-).

Seriously, though, the best thing to do is to do ensure that this doesn't crash single-GPU systems. Exercise the linear and graph workflows. Try different models, loras, IP adapters, upscalers, etc. Run a couple large batches and make sure that they can be paused, resumed and cancelled as usual.

If you have access to a system that has an integrated GPU as well as a discrete one, you can test out the multi-GPU processing simply by queueing up a series of 2 or more generation jobs.

QA Instructions

Squash merge when approved.

Merge Plan

Checklist

  • The PR has a short but descriptive title
  • Tests added / updated
  • Documentation added / updated

@lstein lstein marked this pull request as draft March 20, 2024 03:31
@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files services PRs that change app services python-tests PRs that change python tests labels Mar 20, 2024
@psychedelicious
Copy link
Collaborator

Lincoln, please stop tempting me to buy another RTX 4090.

@github-actions github-actions bot added invocations PRs that change invocations docs PRs that change docs labels Mar 31, 2024
psychedelicious and others added 7 commits April 1, 2024 07:45
Should be waiting on the resume event instead of checking it in a loop
Prefer an early return/continue to reduce the indentation of the processor loop. Easier to read.

There are other ways to improve its structure but at first glance, they seem to involve changing the logic in scarier ways.
@github-actions github-actions bot added the api label Apr 1, 2024
@lstein lstein marked this pull request as ready for review April 16, 2024 20:14
@lstein lstein changed the title Add draft multi-gpu support Add multi-gpu support Apr 16, 2024
@lstein
Copy link
Collaborator Author

lstein commented Apr 17, 2024

I just noticed that the changes to the way VRAM loading is handled is consuming more memory than it should. I’m going to revert to the current method and work on this in a separate PR. (These changes are not related to the multi-GPU support).

@psychedelicious
Copy link
Collaborator

While the code changes are not huge, this is still a very substantial change without a way to strictly feature-flag the multi-GPU handling. Properly testing this will require carefully monitored testing on a staging environment. We don't have capacity to do that and I can't give a solid timeline for when we will.

@makemefeelgr8
Copy link

@lstein You're my hero! Can you hide it behind a checkbox, a setting or the env variable? Just to merge this feature and prevent @psychedelicious from worrying too much.

@psychedelicious
Copy link
Collaborator

@makemefeelgr8 Sorry but it's not that simple. This change needs to wait until we can allocate resources to do thorough testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api backend PRs that change backend files docs PRs that change docs invocations PRs that change invocations python PRs that change python files python-tests PRs that change python tests services PRs that change app services
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants