Feature Suggestion | Serverless Runpod #9

GeoffMillerAZ · 2023-11-06T20:38:28Z

I see that Runpod has a serverless option. Rather than stopping and starting these instances, is it possible to use these models serverless? It looks like you can modify theBloke's dockerfile and configure a network volume to use the model in the workspace of the network volume.

dockerfile setup
create network volume
create an instance on the network volume
download the model(s) into the instance, putting it in the volume.
delete the instance
mount the volume to the serverless GPU endpoint docker template

I am trying to play with doing this, but I have been busy with work, and I don't know what I'm doing here. I have a lot of questions as to whether this is possible or practical. Does each request wait for the model to load into the VRAM?

Serverless could be a cheap and easy way to have permanent setups for using Autogen. This could be especially nice for having multiple serverless GPU endpoints for different AI models that specialize in specific tasks without having the cost or risk of leaving an instance running.

Also, can you set a custom API Key for your runpod endpoint? To make sure your endpoints don't get used by someone else.

GeoffMillerAZ · 2023-11-07T02:22:18Z

@PromptEngineer48
I have played with the templates a bit. It might be nice to include these template modifications in the README in your codebase. Avoiding the UI is always nice =)

Environment Variable	Value
MODEL	Phind/Phind-CodeLlama-34B-v2
UI_ARGS	--extensions openai

This worked for me. I did get an error because the container is broken and I had to go to the shell and do a pip install and restart the container. But once TheBloke fixes the container, using the templates this way will be nice and you won't have to restart the container when you select the openai plugin =) Later I'll play with this and serverless. Still not sure if that works...

Note, if you go into your RunPod profile and set your public key setting it will automagically get set on each template you launch via environment variables to docker.

arianyambao · 2023-11-21T16:24:55Z

Hi, @GeoffMillerAZ this is amazing, I've been trying to make this work since yesterday, I did all of the steps provided in the video, but:

My port 5001 doesn't get exposed at all
Whenever I try to do point the inference to port 5000 it's returning empty responses on my autogen

Would it be okay to ask for your guidance on how you made this work? Thank you so much

PromptEngineer48 · 2023-11-21T16:54:13Z

Things have changed. This should work...
Add the following environment variable to the pod
environment variable called UI_ARGS to your pod with a value of --extensions openai --api-port 5001.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Suggestion | Serverless Runpod #9

Feature Suggestion | Serverless Runpod #9

GeoffMillerAZ commented Nov 6, 2023

GeoffMillerAZ commented Nov 7, 2023

arianyambao commented Nov 21, 2023

PromptEngineer48 commented Nov 21, 2023

Feature Suggestion | Serverless Runpod #9

Feature Suggestion | Serverless Runpod #9

Comments

GeoffMillerAZ commented Nov 6, 2023

GeoffMillerAZ commented Nov 7, 2023

arianyambao commented Nov 21, 2023

PromptEngineer48 commented Nov 21, 2023