Skip to content

zer0int/Auto-GPT

 
 

Repository files navigation

Auto-GPT with CLIP vision for GPT-4 and GPT-3.5

BANNER-CLIP-GPT

A pseudo-multimodal Auto-GPT

(We're essentially just prompting GPT with CLIP's "opinion" tokens for what it "sees" in an image; which is surprisingly effective, nevertheless!)

  • GPT autonomous infinite variation prompting of text-to-image and text-to-3D
  • GPT visual websearch based on image content
  • your_idea_here

Please tag me on Twitter: @zer0int1 if you use this for anything; I'd love to see it!

Demo videos:

🔛

Steps to install / use

Prerequisite:

Optional Prerequisites:

Depending on your use case, you might also need:

Other third-party credits:

🤔❗ Usage:

  1. Ensure prequisites (above repos) are installed and working
  2. Put (git-clone, download-zip-extract) this in your C:/User/JohnDoe or equivalent "user home" folder ("z" is no longer a null byte, I removed it, so your username can be with a "z", like mine)
  3. Edit Auto-GPT/auto_gpt_workspace/CLIP.py according to your hardware (VRAM requirement table inside; yes you can run a small CLIP with 6 GB of VRAM!)
  4. From Auto-GPT/auto_gpt_workspace, copy CLIP.py and (if applicable) SHAPE.py to your C:/User/JohnDoe (one level above the Auto-GPT folder)
  5. Edit Auto-GPT/autogpt/visionconfig.py (Should be straightforward and self-explanatory, define the absolute paths on your local system in this config)
  6. In your .env, remove the comment and set EXECUTE_LOCAL_COMMANDS=True and RESTRICT_TO_WORKSPACE=False
  7. Pick one of the .yaml files from Auto-GPT/ai_settings_examples (see README.TXT for details!), copy it to the main Auto-GPT folder and rename it, so you have e.g. C:/User/JohnDoe/Auto-GPT/ai_settings.yaml
  8. (Optional) put your own images (filenames as per the ai_settings.yaml) in the Auto-GPT/images folder (or use the example images I provided)
  9. (Optional, case: running local stable diffusion): Edit Auto-GPT/auto_gpt_workspace/stablediffusion.py to match the model / config of SD you want to use
  10. (Optional, recommended) Make sure that everything is working by running the scripts independently outside of Auto-GPT (see below)

Verify everything works via cmd -> cd into Auto-GPT/auto_gpt_workspace and run:

python CLIPrun.py --image_path "C:/Users/JohnDoe/Auto-GPT/images/0001.png"

Replace JohnDoe with YourUserName, wait a minute or two (depending on GPU and settings you made in CLIP.py) -> You should now have a CLIP opinion as tokens_0001.txt in Auto-GPT/auto_gpt_workspace

Optional:

python SHAPErun.py --prompt "A Pontiac Firebird car"

You should find s_000.png and the respective .ply in the Auto-GPT/images folder. When you run this FOR THE FIRST TIME EVER, it will download & build the Shap-E models in Auto-GPT/auto_gpt_workspace/shap_e_model_cache --> Please be patient (minutes, depending on your internet speed)!!


0001-RN50x4_L1-exploemails 0001-RN50x4_L4-sett

🤖👀 BONUS: See what CLIP sees by computing (fast!) a heatmap highlighting which regions in the image activate the most to a given caption.

python manual_gradcam.py --image "0001.png" --txt "tokens_0001.txt"

Caption = tokens CLIP 'saw' in the image (returned "opinion" tokens_XXXXX.txt of GPT using "run_clip" on XXXXX.png in Auto-GPT) If you're wondering WTF CLIP saw in your image, and where - run this in a seperate command prompt "on the side" and according to what GPT last used in Auto-GPT. Will dump heatmap images for all CLIP tokens of all four saliency layers of the CLIP model in the Auto-GPT/GradCAM folder. For GradCAM requirements, see Auto-GPT/autogpt/commands/CLIP_gradcam.py -- adaptation of an ipynb notebook, pip install requirements left in as comments at the very top

ToDo: Implement as "y -D" option that Auto-GPT accepts, same as "y -N", to execute after the next time run_clip is executed.


❓ Important tips and troubleshooting, including model limitations

  • Oddly enough, the relative output .\auto_gpt_workspace\clip_tokens.txt will ensure GPT-3.5 does not get confused and not knowing where CLIP token "opinion" is. GPT-4, however, will once try to read_file from the wrong place in the beginning. Simply approve with y, AI will "think" about file-not-found, correct itself, and never make that mistake again. Sorry about a small waste of GPT-4 tokens - but this is the best way to make sure it works out of the box for both GPT models.

  • Make sure the folder structure is exactly as mentioned above. It's a delicate thing with executing subprocesses (.py files) from different locations.

  • Delete the auto-gpt.json in Auto-GPT/auto_gpt_workspace if you change the .yaml or encounter issues.

  • For local stable diffusion: I am using shutil.copyfile instead of shutil.move, meaning, I am trashing up your stablediffusion/outputs folder as I copy the images to Auto-GPT/images. Why? Because I couldn't figure out a way that will ONLY check for existing stable diffusion images, e.g. 00001.png but NOT 0001.png. Naming the files slightly more complex, like SD_00001.png, instead confuses GPT-3.5. So: Better have a bit of redundancy trash than files overwritten, right? Feel free to implement something that works, if you know how - I'd be delighted!


CLIP IS UNCENSORED

⚠️💣💥⚠️

  • CLIP SEES WHATEVER CLIP WANTS TO SEE (doesn't have to be related to what you see) 🤯

⚠️⚠️ WARNING ABOUT "BIAS" AND "HARMFUL" OUTPUT IN PRE-TRAINED, UNCENSORED CLIP MODELS.

While you probably shouldn't run Auto-GPT in "autonomous" mode, anyway, you'll probably also want to ACTUALLY proof-read the GPT-generated prompt ❗❗ carefully rather than just approving it! That is especially the case if you are not running local, and spamming offensive words might just get you banned from a text-to-image API ❗❗

So a harmless image (your opinion) might lead to offensive, racist, biased, sexist output (CLIP opinion) ❗. Especially true if non-English text is present in the image.

  • 👉 More info on typographic attacks and why CLIP is so obsessed with text: Multimodal Neurons
  • 👉 Check the model-card.md and heed the warnings from OpenAI: CLIP Model Card

Use the above CLIPrun.py with pepe.png for an example that shouldn't be too toxic, but proves a point with regard to "oh yes, CLIP knows - CLIP was trained on the internet".

PS: And yes, GPT-3.5 / GPT-4 will accept these terms and make a prompt with them. They might conclude "the CLIP opinion is not very useful" and try to do something else; however, the AI can be persuaded to "use the CLIP tokens to make a prompt for run_image" via user feedback, and will then only refrain from using blatantly offensive words like "r*pe". However, CLIP opinion often includes chained "longword" tokens, like e.g. "instarape" - which GPT accepts, and that will in turn be understood by the CLIP inside stable diffusion et al just as well. ...And likely by an API filter, too.

You have been warned. Do whatever floats your boat, but keep it limited to your boat - and don't blame me for getting kick-banned from any text-to-image API. That's all. ❗

⚠️⚠️⚠️⚠️⚠️⚠️

Original README.MD

Auto-GPT: An Autonomous GPT-4 Experiment

Official Website Unit Tests Discord Follow GitHub Repo stars Twitter Follow

💡 Get help - Q&A or Discord 💬


🔴 🔴 🔴 Urgent: USE stable not master 🔴 🔴 🔴

Download the latest stable release from here: https://github.com/Significant-Gravitas/Auto-GPT/releases/latest. The master branch may often be in a broken state.


Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set. As one of the first examples of GPT-4 running fully autonomously, Auto-GPT pushes the boundaries of what is possible with AI.

Demo April 16th 2023

AutoGPTDemo_Subs_WithoutFinalScreen.mp4

Demo made by Blake Werlinger

💖 Help Fund Auto-GPT's Development 💖

If you can spare a coffee, you can help to cover the costs of developing Auto-GPT and help to push the boundaries of fully autonomous AI! Your support is greatly appreciated. Development of this free, open-source project is made possible by all the contributors and sponsors. If you'd like to sponsor this project and have your avatar or company logo appear below click here.


robinicus  0xmatchmaker  jazgarewal  MayurVirkar  avy-ai  TheStoneMX  goldenrecursion  MatthewAgs  eelbaz  rapidstartup  gklab  VoiceBeer  DailyBotHQ  lucas-chu  knifour  refinery1  st617  neodenit  CrazySwami  Heitechsoft  RealChrisSean  abhinav-pandey29  Explorergt92  SparkplanAI  crizzler  kreativai  omphos  Jahmazon  tjarmain  ddtarazona  saten-private  anvarazizov  lazzacapital  m  Pythagora-io  Web3Capital  toverly1  digisomni  concreit  LeeRobidas  Josecodesalot  dexterityx  rickscode  Brodie0  FSTatSBS  nocodeclarity  jsolejr  amr-elsehemy  RawBanana  horazius  SwftCoins  tob-le-rone  RThaweewat  jun784  joaomdmoura  rejunity  mathewhawkins  caitlynmeeks  jd3655  Odin519Tomas  DataMetis  webbcolton  rocks6  cxs  fruition  nnkostov  morcos  pingbotan  maxxflyer  tommi-joentakanen  hunteraraujo  projectonegames  tullytim  comet-ml  thepok  prompthero  sunchongren  neverinstall  josephcmiller2  yx3110  MBassi91  SpacingLily  arthur-x88  ciscodebs  christian-gheorghe  EngageStrategies  jondwillis  Cameron-Fulton  AryaXAI  AuroraHolding  Mr-Bishop42  doverhq  johnculkin  marv-technology  ikarosai  ColinConwell  humungasaurus  terpsfreak  iddelacruz  thisisjeffchen  nicoguyon  arjunb023  Nalhos  belharethsami  Mobivs  txtr99  ntwrite  founderblocks-sils  kMag410  angiaou  garythebat  lmaugustin  shawnharmsen  clortegah  MetaPath01  sekomike910  MediConCenHK  svpermari0  jacobyoby  turintech  allenstecat  CatsMeow492  tommygeee  judegomila  cfarquhar  ZoneSixGames  kenndanielso  CrypteorCapital  sultanmeghji  jenius-eagle  josephjacks  pingshian0131  AIdevelopersAI  ternary5  ChrisDMT  AcountoOU  chatgpt-prompts  Partender  Daniel1357  KiaArmani  zkonduit  fabrietech  scryptedinc  coreyspagnoli  AntonioCiolino  Dradstone  CarmenCocoa  bentoml  merwanehamadi  vkozacek  ASmithOWL  tekelsey  GalaxyVideoAgency  wenfengwang  rviramontes  indoor47  ZERO-A-ONE  

🚀 Features

  • 🌐 Internet access for searches and information gathering
  • 💾 Long-term and short-term memory management
  • 🧠 GPT-4 instances for text generation
  • 🔗 Access to popular websites and platforms
  • 🗃️ File storage and summarization with GPT-3.5
  • 🔌 Extensibility with Plugins

Quickstart

  1. Check out the wiki
  2. Get an OpenAI API Key
  3. Download the latest release
  4. Follow the installation instructions
  5. Configure any additional features you want, or install some plugins
  6. Run the app

Please see the documentation for full setup instructions and configuration options.

📖 Documentation

⚠️ Limitations

This experiment aims to showcase the potential of GPT-4 but comes with some limitations:

  1. Not a polished application or product, just an experiment
  2. May not perform well in complex, real-world business scenarios. In fact, if it actually does, please share your results!
  3. Quite expensive to run, so set and monitor your API key limits with OpenAI!

🛡 Disclaimer

This project, Auto-GPT, is an experimental application and is provided "as-is" without any warranty, express or implied. By using this software, you agree to assume all risks associated with its use, including but not limited to data loss, system failure, or any other issues that may arise.

The developers and contributors of this project do not accept any responsibility or liability for any losses, damages, or other consequences that may occur as a result of using this software. You are solely responsible for any decisions and actions taken based on the information provided by Auto-GPT.

Please note that the use of the GPT-4 language model can be expensive due to its token usage. By utilizing this project, you acknowledge that you are responsible for monitoring and managing your own token usage and the associated costs. It is highly recommended to check your OpenAI API usage regularly and set up any necessary limits or alerts to prevent unexpected charges.

As an autonomous experiment, Auto-GPT may generate content or take actions that are not in line with real-world business practices or legal requirements. It is your responsibility to ensure that any actions or decisions made based on the output of this software comply with all applicable laws, regulations, and ethical standards. The developers and contributors of this project shall not be held responsible for any consequences arising from the use of this software.

By using Auto-GPT, you agree to indemnify, defend, and hold harmless the developers, contributors, and any affiliated parties from and against any and all claims, damages, losses, liabilities, costs, and expenses (including reasonable attorneys' fees) arising from your use of this software or your violation of these terms.

🐦 Connect with Us on Twitter

Stay up-to-date with the latest news, updates, and insights about Auto-GPT by following our Twitter accounts. Engage with the developer and the AI's own account for interesting discussions, project updates, and more.

  • Developer: Follow @siggravitas for insights into the development process, project updates, and related topics from the creator of Entrepreneur-GPT.
  • Entrepreneur-GPT: Join the conversation with the AI itself by following @En_GPT. Share your experiences, discuss the AI's outputs, and engage with the growing community of users.

We look forward to connecting with you and hearing your thoughts, ideas, and experiences with Auto-GPT. Join us on Twitter and let's explore the future of AI together!

Star History Chart