Skip to content

Latest commit

 

History

History
140 lines (119 loc) · 5.01 KB

README.md

File metadata and controls

140 lines (119 loc) · 5.01 KB

CustomConcept101

We release a dataset of 101 concepts with 3-15 images for each concept for evaluating model customization methods. For a more detailed view of target images please refer to our webpage.


Download dataset

pip install gdown
gdown 1jj8JMtIS5-8vRtNtZ2x8isieWH9yetuK
unzip benchmark_dataset.zip

Evaluation

We provide a set of text prompts for each concept in the prompts folder. The prompt file corresponding to each concept is mentioned in dataset.json and dataset_multiconcept.json. The CLIP feature based image and text similarity can be calculated as:

python evaluate.py --sample_root {folder} --target_path {target-folder} --numgen {numgen}
  • sample_root: the root location to generated images. The folder should contain subfolder samples with generated images. It should also contain a prompts.json file with {'imagename.stem': 'text prompt'} for each image in the samples subfolder.
  • target_path: file to target real images.
  • numgen: number of images in the sample_root/samples folder
  • outpkl: the location to save evaluation results (default: evaluation.pkl)

Results

We compare our method (Custom Diffusion) with DreamBooth and Textual Inversion on this dataset. We trained DreamBooth and Textual Inversion according to the suggested hyperparameters in the respective papers. Both Ours and DreamBooth are trained with generated images as regularization.

Single concept

200 DDPM 50 DDPM
Textual-alignment (CLIP) Image-alignment (CLIP) Image-alignment (DINO) Textual-alignment (CLIP) Image-alignment (CLIP) Image-alignment (DINO)
Textual Inversion 0.6126 0.7524 0.5111 0.6117 0.7530 0.5128
DreamBooth 0.7522 0.7520 0.5533 0.7514 0.7521 0.5541
Custom Diffusion (Ours) 0.7602 0.7440 0.5311 0.7583 0.7456 0.5335

Multiple concept

200 DDPM 50 DDPM
Textual-alignment (CLIP) Image-alignment (CLIP) Image-alignment (DINO) Textual-alignment (CLIP) Image-alignment (CLIP) Image-alignment (DINO)
DreamBooth 0.7383 0.6625 0.3816 0.7366 0.6636 0.3849
Custom Diffusion (Opt) 0.7627 0.6577 0.3650 0.7599 0.6595 0.3684
Custom Diffusion (Joint) 0.7567 0.6680 0.3760 0.7534 0.6704 0.3799

Evaluation prompts

We used ChatGPT to generate 40 image captions for each concept with the instructions to either (1) change the background of the scene while keeping the main subject, (2) insert a new object/living thing in the scene along with the main subject, (3) style variation of the main subject, and (4) change the property or material of the main subject. The generated text prompts are manually filtered or modified to get the final 20 prompts for each concept. A similar strategy is applied for multiple concepts. Some of the prompts are also inspired by other concurrent works e.g. Perfusion, DreamBooth, SuTI, BLIP-Diffusion etc.

License

Images taken from UnSplash are under Unsplash License. Images captured by ourselves are released under CC BY-SA 4.0 license. Flower category images are downloaded from Wikimedia/Flickr/Pixabay and the link to orginial images can also be found here for attribution.

Acknowledgments

We are grateful to Sheng-Yu Wang, Songwei Ge, Daohan Lu, Ruihan Gao, Roni Shechtman, Avani Sethi, Yijia Wang, Shagun Uppal, and Zhizhuo Zhou for helping with the dataset collection, and Nick Kolkin for the feedback.