Question about the refiner step #2

lifeisboringsoprogramming · 2023-07-12T06:50:37Z

I am reading the SDXL paper and found that the refiner is applying to the latent image

but in your codes,

images = pipe_refiner(prompt=prompt, negative_prompt=negative, image=images, num_inference_steps=steps, strength=refiner_strength).images

the input are the images instead of the latents
are they the same?

thanks

The text was updated successfully, but these errors were encountered:

TonyLianLong · 2023-07-12T16:57:39Z

I believe it will re-encode so it's applied on the latents.

The implementation shows that images are transformed to latents prior to processing: https://github.com/huggingface/diffusers/blob/af48bf200860d8b83fe3be92b2d7ae556a3b4111/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py#L841

I believe this is their recommended way to do refinement, as this is in their PR examples.

lifeisboringsoprogramming · 2023-07-12T17:08:07Z

Thank you for your reply
The PR use

images = pipe(prompt=prompt, output_type="latent").images

before the refiner

there is an output_type="latent" parameter

How much VRAM is needed if we move the pipe to cuda instead of enable_model_cpu_offload?

I do not have enough VRAM so I cannot know

Thanks

TonyLianLong · 2023-07-12T17:10:37Z

there is an output_type="latent" parameter

In this way you are right, the process of converting the latents to image can be skipped.

How much VRAM is needed if we move the pipe to cuda instead of enable_model_cpu_offload?

With 4 images, you can use 24G GPU Memory.

lifeisboringsoprogramming · 2023-07-12T17:23:59Z

I have 12G VRAM and cannot even do one image using pipe to cuda, thank you so much.

TonyLianLong · 2023-07-12T17:32:23Z

Two updates:

If the intermediate images are not needed (i.e., we don't want to compare before/after), no decoding and re-encoding between the base generation and refinement stage are used.
Offloading can be controlled with environment variables.

lifeisboringsoprogramming · 2023-07-12T21:42:24Z

I got some results
on the left: using images as refiner input
on the right: using latent as refiner input

the head of the middle guy has some differences.

Thanks

TonyLianLong · 2023-07-12T23:40:51Z

Thanks for this example. Is using images consistently better than using latents?

lifeisboringsoprogramming · 2023-07-13T06:14:20Z

I did not do any more testing for that
I think I decided to do the refiner only after knowing how this picture looked is a better workflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the refiner step #2

Question about the refiner step #2

lifeisboringsoprogramming commented Jul 12, 2023

TonyLianLong commented Jul 12, 2023

lifeisboringsoprogramming commented Jul 12, 2023

TonyLianLong commented Jul 12, 2023 •

edited

lifeisboringsoprogramming commented Jul 12, 2023

TonyLianLong commented Jul 12, 2023

lifeisboringsoprogramming commented Jul 12, 2023

TonyLianLong commented Jul 12, 2023

lifeisboringsoprogramming commented Jul 13, 2023

Question about the refiner step #2

Question about the refiner step #2

Comments

lifeisboringsoprogramming commented Jul 12, 2023

TonyLianLong commented Jul 12, 2023

lifeisboringsoprogramming commented Jul 12, 2023

TonyLianLong commented Jul 12, 2023 • edited

lifeisboringsoprogramming commented Jul 12, 2023

TonyLianLong commented Jul 12, 2023

lifeisboringsoprogramming commented Jul 12, 2023

TonyLianLong commented Jul 12, 2023

lifeisboringsoprogramming commented Jul 13, 2023

TonyLianLong commented Jul 12, 2023 •

edited