Skip to content

Latest commit

 

History

History
77 lines (53 loc) · 2.3 KB

README.md

File metadata and controls

77 lines (53 loc) · 2.3 KB

minidalle3

Technical ReportProject pageDemo (Temporarily Unavailable)

minidalle3.mp4

teaser4

An experimental attempt to obtain the interactive and interleave text-to-image and text-to-text experience of DALL•E 3 and ChatGPT.

Try Yourself 🤗

  • Download the checkpoint and save it as following
checkpoints
   - models
   - sdxl_models
  • run the following commands, and you will get a gradio-based web demo.
export OPENAI_API_KEY="your key"
python -m minidalle3.web 
  • To use other LLM rather than ChatGPT, such as baichuan.
python -m minidalle3.llm.baichuan
export OPENAI_API_BASE="http://0.0.0.0:10039/v1"
python -m minidalle3.web

chatglm, baichuan, internlm are tested. llama have not supported yet. qwen is not tested.

TODO

  • Support generating image interleaved in the conversations.
  • Support generating multiple images at once.
  • Support selecting image.
  • Support refinement.
  • Support prompt refinement/variation.
  • Instruct tuned LLM/SD.

Citation

If you find this repo helpful, please consider citing us.

@misc{minidalle3,
    author={Lai, Zeqiang and Zhu, Xizhou and Dai, Jifeng and Qiao, Yu and Wang, Wenhai},
    title={Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models},
    year={2023},
    url={https://github.com/Zeqiang-Lai/Mini-DALLE3},
}

Acknowledgement

IP-AdapterStable Diffusion XL

Visitors