Auto-GPT with CLIP vision for GPT-4 and GPT-3.5

A pseudo-multimodal Auto-GPT

(We're essentially just prompting GPT with CLIP's "opinion" tokens for what it "sees" in an image; which is surprisingly effective, nevertheless!)

GPT autonomous infinite variation prompting of text-to-image and text-to-3D
GPT visual websearch based on image content
your_idea_here

Please tag me on Twitter: @zer0int1 if you use this for anything; I'd love to see it!

Demo videos:

🔛

Steps to install / use

Prerequisite:

✅ CLIP by OpenAI

Optional Prerequisites:

Depending on your use case, you might also need:

✔️ Shap-E by OpenAI (Text-to-3D)
✔️ Stable Diffusion by Stability-AI (Text-to-Image, for commandline & running local as implemented in this repo)

Other third-party credits:

CLIP Gradient Ascent: Adaptation of the original notebook "Closed Test Ascending CLIPtext" by @advadnoun
CLIP GradCAM: https://github.com/kevinzakka/clip_playground

🤔❗ Usage:

Ensure prequisites (above repos) are installed and working
Put (git-clone, download-zip-extract) this in your C:/User/JohnDoe or equivalent "user home" folder ("z" is no longer a null byte, I removed it, so your username can be with a "z", like mine)
Edit Auto-GPT/auto_gpt_workspace/CLIP.py according to your hardware (VRAM requirement table inside; yes you can run a small CLIP with 6 GB of VRAM!)
From Auto-GPT/auto_gpt_workspace, copy CLIP.py and (if applicable) SHAPE.py to your C:/User/JohnDoe (one level above the Auto-GPT folder)
Edit Auto-GPT/autogpt/visionconfig.py (Should be straightforward and self-explanatory, define the absolute paths on your local system in this config)
In your .env, remove the comment and set EXECUTE_LOCAL_COMMANDS=True and RESTRICT_TO_WORKSPACE=False
Pick one of the .yaml files from Auto-GPT/ai_settings_examples (see README.TXT for details!), copy it to the main Auto-GPT folder and rename it, so you have e.g. C:/User/JohnDoe/Auto-GPT/ai_settings.yaml
(Optional) put your own images (filenames as per the ai_settings.yaml) in the Auto-GPT/images folder (or use the example images I provided)
(Optional, case: running local stable diffusion): Edit Auto-GPT/auto_gpt_workspace/stablediffusion.py to match the model / config of SD you want to use
(Optional, recommended) Make sure that everything is working by running the scripts independently outside of Auto-GPT (see below)

Verify everything works via cmd -> cd into Auto-GPT/auto_gpt_workspace and run:

python CLIPrun.py --image_path "C:/Users/JohnDoe/Auto-GPT/images/0001.png"

Replace JohnDoe with YourUserName, wait a minute or two (depending on GPU and settings you made in CLIP.py) -> You should now have a CLIP opinion as tokens_0001.txt in Auto-GPT/auto_gpt_workspace

Optional:

python SHAPErun.py --prompt "A Pontiac Firebird car"

You should find s_000.png and the respective .ply in the Auto-GPT/images folder. When you run this FOR THE FIRST TIME EVER, it will download & build the Shap-E models in Auto-GPT/auto_gpt_workspace/shap_e_model_cache --> Please be patient (minutes, depending on your internet speed)!!

🤖👀 BONUS: See what CLIP sees by computing (fast!) a heatmap highlighting which regions in the image activate the most to a given caption.

python manual_gradcam.py --image "0001.png" --txt "tokens_0001.txt"

Caption = tokens CLIP 'saw' in the image (returned "opinion" tokens_XXXXX.txt of GPT using "run_clip" on XXXXX.png in Auto-GPT) If you're wondering WTF CLIP saw in your image, and where - run this in a seperate command prompt "on the side" and according to what GPT last used in Auto-GPT. Will dump heatmap images for all CLIP tokens of all four saliency layers of the CLIP model in the Auto-GPT/GradCAM folder. For GradCAM requirements, see Auto-GPT/autogpt/commands/CLIP_gradcam.py -- adaptation of an ipynb notebook, pip install requirements left in as comments at the very top

ToDo: Implement as "y -D" option that Auto-GPT accepts, same as "y -N", to execute after the next time run_clip is executed.

❓ Important tips and troubleshooting, including model limitations

Oddly enough, the relative output .\auto_gpt_workspace\clip_tokens.txt will ensure GPT-3.5 does not get confused and not knowing where CLIP token "opinion" is. GPT-4, however, will once try to read_file from the wrong place in the beginning. Simply approve with y, AI will "think" about file-not-found, correct itself, and never make that mistake again. Sorry about a small waste of GPT-4 tokens - but this is the best way to make sure it works out of the box for both GPT models.
Make sure the folder structure is exactly as mentioned above. It's a delicate thing with executing subprocesses (.py files) from different locations.
Delete the auto-gpt.json in Auto-GPT/auto_gpt_workspace if you change the .yaml or encounter issues.
For local stable diffusion: I am using shutil.copyfile instead of shutil.move, meaning, I am trashing up your stablediffusion/outputs folder as I copy the images to Auto-GPT/images. Why? Because I couldn't figure out a way that will ONLY check for existing stable diffusion images, e.g. 00001.png but NOT 0001.png. Naming the files slightly more complex, like SD_00001.png, instead confuses GPT-3.5. So: Better have a bit of redundancy trash than files overwritten, right? Feel free to implement something that works, if you know how - I'd be delighted!

CLIP IS UNCENSORED

⚠️💣💥⚠️

CLIP SEES WHATEVER CLIP WANTS TO SEE (doesn't have to be related to what you see) 🤯

⚠️⚠️ WARNING ABOUT "BIAS" AND "HARMFUL" OUTPUT IN PRE-TRAINED, UNCENSORED CLIP MODELS.

While you probably shouldn't run Auto-GPT in "autonomous" mode, anyway, you'll probably also want to ACTUALLY proof-read the GPT-generated prompt ❗❗ carefully rather than just approving it! That is especially the case if you are not running local, and spamming offensive words might just get you banned from a text-to-image API ❗❗

So a harmless image (your opinion) might lead to offensive, racist, biased, sexist output (CLIP opinion) ❗. Especially true if non-English text is present in the image.

👉 More info on typographic attacks and why CLIP is so obsessed with text: Multimodal Neurons
👉 Check the model-card.md and heed the warnings from OpenAI: CLIP Model Card

Use the above CLIPrun.py with pepe.png for an example that shouldn't be too toxic, but proves a point with regard to "oh yes, CLIP knows - CLIP was trained on the internet".

PS: And yes, GPT-3.5 / GPT-4 will accept these terms and make a prompt with them. They might conclude "the CLIP opinion is not very useful" and try to do something else; however, the AI can be persuaded to "use the CLIP tokens to make a prompt for run_image" via user feedback, and will then only refrain from using blatantly offensive words like "r*pe". However, CLIP opinion often includes chained "longword" tokens, like e.g. "instarape" - which GPT accepts, and that will in turn be understood by the CLIP inside stable diffusion et al just as well. ...And likely by an API filter, too.

You have been warned. Do whatever floats your boat, but keep it limited to your boat - and don't blame me for getting kick-banned from any text-to-image API. That's all. ❗

⚠️⚠️⚠️⚠️⚠️⚠️

Original README.MD

Auto-GPT: An Autonomous GPT-4 Experiment

💡 Get help - Q&A or Discord 💬

🔴 🔴 🔴 Urgent: USE `stable` not `master` 🔴 🔴 🔴

Download the latest stable release from here: https://github.com/Significant-Gravitas/Auto-GPT/releases/latest. The master branch may often be in a broken state.

Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set. As one of the first examples of GPT-4 running fully autonomously, Auto-GPT pushes the boundaries of what is possible with AI.

Demo April 16th 2023

AutoGPTDemo_Subs_WithoutFinalScreen.mp4

Demo made by Blake Werlinger

💖 Help Fund Auto-GPT's Development 💖

If you can spare a coffee, you can help to cover the costs of developing Auto-GPT and help to push the boundaries of fully autonomous AI! Your support is greatly appreciated. Development of this free, open-source project is made possible by all the contributors and sponsors. If you'd like to sponsor this project and have your avatar or company logo appear below click here.

🚀 Features

🌐 Internet access for searches and information gathering
💾 Long-term and short-term memory management
🧠 GPT-4 instances for text generation
🔗 Access to popular websites and platforms
🗃️ File storage and summarization with GPT-3.5
🔌 Extensibility with Plugins

Quickstart

Check out the wiki
Get an OpenAI API Key
Download the latest release
Follow the installation instructions
Configure any additional features you want, or install some plugins
Run the app

Please see the documentation for full setup instructions and configuration options.

📖 Documentation

⚠️ Limitations

This experiment aims to showcase the potential of GPT-4 but comes with some limitations:

Not a polished application or product, just an experiment
May not perform well in complex, real-world business scenarios. In fact, if it actually does, please share your results!
Quite expensive to run, so set and monitor your API key limits with OpenAI!

🛡 Disclaimer

This project, Auto-GPT, is an experimental application and is provided "as-is" without any warranty, express or implied. By using this software, you agree to assume all risks associated with its use, including but not limited to data loss, system failure, or any other issues that may arise.

The developers and contributors of this project do not accept any responsibility or liability for any losses, damages, or other consequences that may occur as a result of using this software. You are solely responsible for any decisions and actions taken based on the information provided by Auto-GPT.

Please note that the use of the GPT-4 language model can be expensive due to its token usage. By utilizing this project, you acknowledge that you are responsible for monitoring and managing your own token usage and the associated costs. It is highly recommended to check your OpenAI API usage regularly and set up any necessary limits or alerts to prevent unexpected charges.

As an autonomous experiment, Auto-GPT may generate content or take actions that are not in line with real-world business practices or legal requirements. It is your responsibility to ensure that any actions or decisions made based on the output of this software comply with all applicable laws, regulations, and ethical standards. The developers and contributors of this project shall not be held responsible for any consequences arising from the use of this software.

By using Auto-GPT, you agree to indemnify, defend, and hold harmless the developers, contributors, and any affiliated parties from and against any and all claims, damages, losses, liabilities, costs, and expenses (including reasonable attorneys' fees) arising from your use of this software or your violation of these terms.

🐦 Connect with Us on Twitter

Stay up-to-date with the latest news, updates, and insights about Auto-GPT by following our Twitter accounts. Engage with the developer and the AI's own account for interesting discussions, project updates, and more.

Developer: Follow @siggravitas for insights into the development process, project updates, and related topics from the creator of Entrepreneur-GPT.
Entrepreneur-GPT: Join the conversation with the AI itself by following @En_GPT. Share your experiences, discuss the AI's outputs, and engage with the growing community of users.

We look forward to connecting with you and hearing your thoughts, ideas, and experiences with Auto-GPT. Join us on Twitter and let's explore the future of AI together!

Name		Name	Last commit message	Last commit date
Latest commit History 1,954 Commits
.devcontainer		.devcontainer
.github		.github
GradCAM		GradCAM
ai_settings_examples		ai_settings_examples
auto_gpt_workspace		auto_gpt_workspace
autogpt		autogpt
benchmark		benchmark
data		data
docs		docs
images		images
plugins		plugins
scripts		scripts
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.env.template		.env.template
.envrc		.envrc
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
.sourcery.yaml		.sourcery.yaml
BULLETIN.md		BULLETIN.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
azure.yaml.template		azure.yaml.template
codecov.yml		codecov.yml
data_ingestion.py		data_ingestion.py
docker-compose.yml		docker-compose.yml
main.py		main.py
mkdocs.yml		mkdocs.yml
plugin.png		plugin.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.bat		run.bat
run.sh		run.sh
run_continuous.bat		run_continuous.bat
run_continuous.sh		run_continuous.sh
tests.py		tests.py

License

zer0int/Auto-GPT

Folders and files

Latest commit

History

Repository files navigation