Texty Diffusion

Making images with in-scene text more realistic.

Changes to original Stable Diffusion repository:

Original Stable Diffusion repository: https://github.com/CompVis/stable-diffusion

bf16 – a lot of minior modifications and fp32 conversions for the operations that don't support bf16 (interpolate)
Deepspeed support
Hand-written training loop (main_nolightning) that is easier to read and modify than lightning's one
Code restructuring to improve readability
LoRa adapters support for UNet

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

Weights

Stable Diffusion currently provide the following checkpoints:

sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024).
sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on laion-aesthetics v2 5+ (a subset of laion2B-en with estimated aesthetics score > 5.0, and additionally filtered to images with an original size >= 512x512, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using the LAION-Aesthetics Predictor V2).
sd-v1-3.ckpt: Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
sd-v1-4.ckpt: Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.

Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling steps show the relative improvements of the checkpoints:

Text-to-Image with Stable Diffusion

Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder. We provide a reference script for sampling, but there also exists a diffusers integration, which we expect to see more active community development.

Texty Caps Dataset

Mining from LAION-5B:

Run download script like this:

python scripts/download_laion_sharded.py \
    --input_dir "data/text-laion-20M" \
    --output_dir "data/text-laion-20M-images" \
    --shard_size 100000 \
    --num_shards 100 \

Run the following script to apply an OCR system to the images and filter out the ones that don't have text:

python scripts/ocr.py \
    --input_dir "data/text-laion-20M-images" \
    --output_dir "data/text-laion-20M-images-with-text" \
    --num_shards 100

Comments

Stable Diffusion codebase for the diffusion models builds heavily on OpenAI's ADM codebase and https://github.com/lucidrains/denoising-diffusion-pytorch. Thanks for open-sourcing!
The implementation of the transformer encoder is from x-transformers by lucidrains.

BibTeX

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
assets		assets
configs/stable-diffusion		configs/stable-diffusion
data		data
ldm		ldm
models		models
notebooks		notebooks
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Stable_Diffusion_v1_Model_Card.md		Stable_Diffusion_v1_Model_Card.md
environment.yaml		environment.yaml
main.py		main.py
main_nolightning.py		main_nolightning.py
notebook_helpers.py		notebook_helpers.py
setup.py		setup.py

License

Guitaricet/texty-diffusion

Folders and files

Latest commit

History

Repository files navigation

Texty Diffusion

Changes to original Stable Diffusion repository:

Requirements

Weights

Text-to-Image with Stable Diffusion

Texty Caps Dataset

Comments

BibTeX

About

Topics

Resources

License

Stars

Watchers

Forks

Languages