Skip to content

5663015/segment_anything_webui

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Segment Anything WebUI

This project is based on Segment Anything Model by Meta. The UI is based on Gradio.

Change Logs

  • [2023-4-11]
    • Support video segmentation. A short video can be automatically segmented by SAM.
    • Support text prompt segmentation using OWL-ViT (Vision Transformer for Open-World Localization) model. Text prompt is not yet released in the current SAM version, so it is implemented indirectly using OWL-ViT.
  • [2023-4-15]
    • Support points prompt segmentation. But due to this issue, using text and point prompts together may result in an error.
    • About boxes prompt, it does not seem possible to draw the box directly in Gradio. One idea is to use two points to represent the box, but this is not accurate or elegant. Also, text prompt implements box prompt indirectly, so I won't implement box prompt directly for now. If you have any ideas about box-drawing in Gradio, please tell me.

Usage

Following usage is running on your computer.

pip install git+https://github.com/facebookresearch/segment-anything.git
  • git clone this repository:
git clone https://github.com/5663015/segment_anything_webui.git
  • Make a new folder named checkpoints under this project,and put the downloaded weights files in checkpoints。You can download the weights using following URLs:

  • Under checkpoints, make a new folder named models--google--owlvit-base-patch32, and put the downloaded OWL-ViT weights files in models--google--owlvit-base-patch32.

  • Run:

python app.py

Note: Default model is vit_b,the demo can run on CPU. Default device is cpu

TODO

  • Video segmentation

  • Add text prompt

  • Add points prompt

  • Add boxes prompt

  • Try to combine with ControlNet and Stable Diffusion. Use SAM to generate dataset for fine-tuning ControlNet, and generate new image with SD.

Reference