Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the refiner step #2

Open
lifeisboringsoprogramming opened this issue Jul 12, 2023 · 8 comments
Open

Question about the refiner step #2

lifeisboringsoprogramming opened this issue Jul 12, 2023 · 8 comments

Comments

@lifeisboringsoprogramming

I am reading the SDXL paper and found that the refiner is applying to the latent image

image

but in your codes,

images = pipe_refiner(prompt=prompt, negative_prompt=negative, image=images, num_inference_steps=steps, strength=refiner_strength).images

the input are the images instead of the latents
are they the same?

thanks

@TonyLianLong
Copy link
Owner

I believe it will re-encode so it's applied on the latents.

The implementation shows that images are transformed to latents prior to processing: https://github.com/huggingface/diffusers/blob/af48bf200860d8b83fe3be92b2d7ae556a3b4111/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py#L841

I believe this is their recommended way to do refinement, as this is in their PR examples.

@lifeisboringsoprogramming
Copy link
Author

Thank you for your reply
The PR use

images = pipe(prompt=prompt, output_type="latent").images

before the refiner

there is an output_type="latent" parameter

How much VRAM is needed if we move the pipe to cuda instead of enable_model_cpu_offload?

I do not have enough VRAM so I cannot know

Thanks

@TonyLianLong
Copy link
Owner

TonyLianLong commented Jul 12, 2023

there is an output_type="latent" parameter

In this way you are right, the process of converting the latents to image can be skipped.

How much VRAM is needed if we move the pipe to cuda instead of enable_model_cpu_offload?

With 4 images, you can use 24G GPU Memory.

@lifeisboringsoprogramming
Copy link
Author

I have 12G VRAM and cannot even do one image using pipe to cuda, thank you so much.

@TonyLianLong
Copy link
Owner

Two updates:

  1. If the intermediate images are not needed (i.e., we don't want to compare before/after), no decoding and re-encoding between the base generation and refinement stage are used.
  2. Offloading can be controlled with environment variables.

@lifeisboringsoprogramming
Copy link
Author

refiner-latent-vs

I got some results
on the left: using images as refiner input
on the right: using latent as refiner input

the head of the middle guy has some differences.

Thanks

@TonyLianLong
Copy link
Owner

Thanks for this example. Is using images consistently better than using latents?

@lifeisboringsoprogramming
Copy link
Author

I did not do any more testing for that
I think I decided to do the refiner only after knowing how this picture looked is a better workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants