Skip to content

Latest commit

History

History
57 lines (34 loc) 路 2.52 KB

README.md

File metadata and controls

57 lines (34 loc) 路 2.52 KB

Stable Diffusion XL (2023)

Stable Diffusion XL

Task: Text2Image, Inpainting

Abstract

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.

Pretrained models

Model Task Dataset Download
stable_diffusion_xl Text2Image - -

We use stable diffusion xl weights. This model has several weights including vae, unet and clip.

You may download the weights from stable-diffusion-xl and change the 'from_pretrained' in config to the weights dir.

Quick Start

Running the following codes, you can get a text-generated image.

from mmengine import MODELS, Config

from mmengine.registry import init_default_scope

init_default_scope('mmagic')

config = 'configs/stable_diffusion_xl/stable-diffusion_xl_ddim_denoisingunet.py'
config = Config.fromfile(config).copy()

StableDiffuser = MODELS.build(config.model)
prompt = 'A mecha robot in a favela in expressionist style'
StableDiffuser = StableDiffuser.to('cuda')

image = StableDiffuser.infer(prompt)['samples'][0]
image.save('robot.png')

Comments

Our codebase for the stable diffusion models builds heavily on diffusers codebase and the model weights are from stable-diffusion-xl.

Thanks for the efforts of the community!