Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Suggestion | Serverless Runpod #9

Open
GeoffMillerAZ opened this issue Nov 6, 2023 · 3 comments
Open

Feature Suggestion | Serverless Runpod #9

GeoffMillerAZ opened this issue Nov 6, 2023 · 3 comments

Comments

@GeoffMillerAZ
Copy link

I see that Runpod has a serverless option. Rather than stopping and starting these instances, is it possible to use these models serverless? It looks like you can modify theBloke's dockerfile and configure a network volume to use the model in the workspace of the network volume.

  • dockerfile setup
  • create network volume
  • create an instance on the network volume
  • download the model(s) into the instance, putting it in the volume.
  • delete the instance
  • mount the volume to the serverless GPU endpoint docker template

I am trying to play with doing this, but I have been busy with work, and I don't know what I'm doing here. I have a lot of questions as to whether this is possible or practical. Does each request wait for the model to load into the VRAM?

Serverless could be a cheap and easy way to have permanent setups for using Autogen. This could be especially nice for having multiple serverless GPU endpoints for different AI models that specialize in specific tasks without having the cost or risk of leaving an instance running.

Also, can you set a custom API Key for your runpod endpoint? To make sure your endpoints don't get used by someone else.

@GeoffMillerAZ
Copy link
Author

@PromptEngineer48
I have played with the templates a bit. It might be nice to include these template modifications in the README in your codebase. Avoiding the UI is always nice =)

Environment Variable Value
MODEL Phind/Phind-CodeLlama-34B-v2
UI_ARGS --extensions openai

This worked for me. I did get an error because the container is broken and I had to go to the shell and do a pip install and restart the container. But once TheBloke fixes the container, using the templates this way will be nice and you won't have to restart the container when you select the openai plugin =) Later I'll play with this and serverless. Still not sure if that works...

Note, if you go into your RunPod profile and set your public key setting it will automagically get set on each template you launch via environment variables to docker.

image

@arianyambao
Copy link

Hi, @GeoffMillerAZ this is amazing, I've been trying to make this work since yesterday, I did all of the steps provided in the video, but:

  1. My port 5001 doesn't get exposed at all
  2. Whenever I try to do point the inference to port 5000 it's returning empty responses on my autogen

Would it be okay to ask for your guidance on how you made this work? Thank you so much

@PromptEngineer48
Copy link
Owner

Things have changed. This should work...
Add the following environment variable to the pod
environment variable called UI_ARGS to your pod with a value of --extensions openai --api-port 5001.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants