Skip to content

This repo showcases how to use llmlingua via RunPod serverless endpoint

License

Notifications You must be signed in to change notification settings

jlonge4/runpod_llmlingua

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMLingua Prompt Compression | RunPod

🚀 | A custom RunPod Serverless Endpoint template that employs LLMlingua x Microsoft Phi-2 for prompt compression in seconds.

📖 | Getting Started

  1. Navigate to RunPod Serverless RUNPOD

  2. Create an endpoint.

  3. Enter jlonge4/runpod-llmlingua:v3 as your image.

  4. Checkout the test notebook for an example of sending a request for compressing context.

  5. Send your compressed query to your LLM!

  6. Alternatively deploy using my template here TEMPLATE

Example Input

{
    "input": {
        "context": "[context]",
        "instruction": "You are a q/a bot who uses the provided context to answer a question",
        "question": "What's the purpose of the tutorial?",
        "target_tokens": 350,
    }
}

🚀 | Execution Result

  • Wall time: 1.25 s
{
    "compressed_prompt": "You are a question answering bot who uses the provided\n"
                         "context to answer a question\n"
                         "In this short will explore how Face be deployed in a\n"
                         "Docker Container and a service...\n"
                         "What's the purpose of the tutorial?",
    "compressed_tokens": 788,
    "origin_tokens": 2171,
    "ratio": "2.8x",
    "saving": "Saving $0.1 in GPT-4."
}

🔗 | Links

🐳 Docker Container

About

This repo showcases how to use llmlingua via RunPod serverless endpoint

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published