RTX 3090 insane low speed #11

davizca · 2023-12-06T17:18:58Z

Hi.

I'm using RTX 3090 GPU with 24 GB VRAM and I think there is something wrong.

Theorically it should be 3 minutes or so and nope.

Also posted on reddit

Cheers!

RuoyiDu · 2023-12-06T19:05:53Z

Hi @davizca, please try to set view_batch_size to 16. It should work for 3090 and will make inference faster.

davizca · 2023-12-06T19:41:10Z

Hi @RuoyiDu, thanks for the answer.

I did view batch size to 16 and for a 2048x2048 on phase 2 decoding is giving me like +12 mins (and it's increasing slowly, I guess its the same as before). 1024x1024 is running super fast though.

Settings and screenshot added:

Cheers.

RuoyiDu · 2023-12-06T21:24:50Z

Hi @davizca, this is very strange now. Are you running on a laptop with RTX3090? The power of the GPU also affects the inference time -- I'm using the RTX3090 on a local server with the power of 350W. You can check the power by nvidia-smi.

davizca · 2023-12-06T23:14:09Z

Hi @RuoyiDu
No, I'm using desktop PC with Wiindows and RTX 3090.

Nvidia-smi says 190 average W. And Board Draw Power (power) the same. Peaking 23.6 GB VRAM inferencing a 2048x2048 image. The part where it takes eternal time is phase 2 decoding (the previous are fast). I don't know if this is because some dependencies but if other users with RTX 3090 can test it will be awesome. I never got with this pipeline constant 350W of BDP.

RuoyiDu · 2023-12-07T22:11:03Z

Hi @davizca, on my server, it takes about 80s under full load.

I'll try to optimise the speed of the decoding. But it looks like there are other reasons here for it being especially slow at your end. Let's see if anyone else in the community is experiencing similar issues.

siraxe · 2023-12-07T23:42:15Z

3090 on pc
At phase 2 Decoding at 2k resolution it throws work into Shared GPU memory and slows down to unusable point

RuoyiDu · 2023-12-07T23:54:00Z

Hi @siraxe @davizca. Can you guys try to generate at 2048x2048 and set multi_decoder=False? For generating 2048x2048 images on 3090, we don't need the tiled decoder. Then we can see if the problem is with the tiled decoder.

siraxe · 2023-12-08T00:07:29Z

Hi @siraxe @davizca. Can you guys try to generate at 2048x2048 and set multi_decoder=False? For generating 2048x2048 images on 3090, we don't need the tiled decoder. Then we can see if the problem is with the tiled decoder.

Okay that helped , about 328 second for 50 steps👍

RuoyiDu · 2023-12-08T00:14:35Z

Thanks @siraxe! But it's still much slower than on my machine... It seems the decoder is quite slow on your PC, which makes it ridiculously slow when using tiled decoder. I will try to figure out the reason -- but it may be a little hard for me since I can't reproduce this issue on my end.

BTW, I like your generation! Hope you can enjoy it!

Yggdrasil-Engineering · 2023-12-08T03:01:58Z

Was also seeing super slow times on my 4090. Set multi_decoder=False and speed dramatically improved!
It's amazing what the parameters being piped can do to generation times.

With a low batch size of 4, and multi-decoding set to true I was seeing hour long generation times. Down to 6 minutes now that I've fixed those! Hope this information is helpful.

davizca · 2023-12-08T11:50:30Z

EI hi. Thanks everyone for checking into this. Currently I'm not at home but on Monday will try the fix. Its weird the difference in inferencing times of @RuoyiDu and the others... we will see Whats happening here ;)

RuoyiDu · 2023-12-10T03:00:58Z

Hi guys @davizca @siraxe @Yggdrasil-Engineering, I find a little mistake at line #607:
pad_size = self.unet.config.sample_size // 4 * 3
should be
pad_size = self.unet.config.sample_size // 8 * 3.
This should make the VRAM cost in line with the paper (about 17GB) and also make it decodes faster when multi_decoder=True.
But this bug doesn't affect the result of multi_decoder=False. So I think there might be other reasons, like GPU power (I'm using a 350W RTX 3090 instead of a 280W one).

davizca · 2023-12-11T19:45:19Z

@RuoyiDu
With multidecoder = TRUE (normal settings, 2048x2048):

Phase 1 Denoising

100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:39<00:00, 1.27it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:22<00:00, 3.21s/it]

Phase 1 Denoising

100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:13<00:00, 3.58it/s]

Phase 2 Denoising

100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [02:56<00:00, 3.42s/it]### Phase 2 Decoding ###
100%|██████████████████████████████████████████████████████████████████████████████████| 64/64 [00:23<00:00, 2.70it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [03:24<00:00, 4.09s/it]

With multidecoder = False (same settings):
3:30 more or less the same. (Will make screenshot later).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTX 3090 insane low speed #11

RTX 3090 insane low speed #11

davizca commented Dec 6, 2023

RuoyiDu commented Dec 6, 2023

davizca commented Dec 6, 2023 •

edited

RuoyiDu commented Dec 6, 2023

davizca commented Dec 6, 2023

RuoyiDu commented Dec 7, 2023 •

edited

siraxe commented Dec 7, 2023

RuoyiDu commented Dec 7, 2023

siraxe commented Dec 8, 2023

RuoyiDu commented Dec 8, 2023 •

edited

Yggdrasil-Engineering commented Dec 8, 2023

davizca commented Dec 8, 2023

RuoyiDu commented Dec 10, 2023 •

edited

davizca commented Dec 11, 2023

RTX 3090 insane low speed #11

RTX 3090 insane low speed #11

Comments

davizca commented Dec 6, 2023

RuoyiDu commented Dec 6, 2023

davizca commented Dec 6, 2023 • edited

RuoyiDu commented Dec 6, 2023

davizca commented Dec 6, 2023

RuoyiDu commented Dec 7, 2023 • edited

siraxe commented Dec 7, 2023

RuoyiDu commented Dec 7, 2023

siraxe commented Dec 8, 2023

RuoyiDu commented Dec 8, 2023 • edited

Yggdrasil-Engineering commented Dec 8, 2023

davizca commented Dec 8, 2023

RuoyiDu commented Dec 10, 2023 • edited

davizca commented Dec 11, 2023

Phase 1 Denoising

Phase 1 Denoising

Phase 2 Denoising

davizca commented Dec 6, 2023 •

edited

RuoyiDu commented Dec 7, 2023 •

edited

RuoyiDu commented Dec 8, 2023 •

edited

RuoyiDu commented Dec 10, 2023 •

edited